<!doctype html>
<html style='font-size:20px !important'>
<head>
<meta charset='UTF-8'><meta name='viewport' content='width=device-width initial-scale=1'>
<title>7.chapter_seven</title><style type='text/css'>html {overflow-x: initial !important;}:root { --bg-color:#ffffff; --text-color:#333333; --select-text-bg-color:#B5D6FC; --select-text-font-color:auto; --monospace:"Lucida Console",Consolas,"Courier",monospace; }
html { font-size: 14px; background-color: var(--bg-color); color: var(--text-color); font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; -webkit-font-smoothing: antialiased; }
body { margin: 0px; padding: 0px; height: auto; bottom: 0px; top: 0px; left: 0px; right: 0px; font-size: 1rem; line-height: 1.42857; overflow-x: hidden; background: inherit; tab-size: 4; }
iframe { margin: auto; }
a.url { word-break: break-all; }
a:active, a:hover { outline: 0px; }
.in-text-selection, ::selection { text-shadow: none; background: var(--select-text-bg-color); color: var(--select-text-font-color); }
#write { margin: 0px auto; height: auto; width: inherit; word-break: normal; overflow-wrap: break-word; position: relative; white-space: normal; overflow-x: visible; padding-top: 40px; }
#write.first-line-indent p { text-indent: 2em; }
#write.first-line-indent li p, #write.first-line-indent p * { text-indent: 0px; }
#write.first-line-indent li { margin-left: 2em; }
.for-image #write { padding-left: 8px; padding-right: 8px; }
body.typora-export { padding-left: 30px; padding-right: 30px; }
.typora-export .footnote-line, .typora-export li, .typora-export p { white-space: pre-wrap; }
.typora-export .task-list-item input { pointer-events: none; }
@media screen and (max-width: 500px) {
  body.typora-export { padding-left: 0px; padding-right: 0px; }
  #write { padding-left: 20px; padding-right: 20px; }
  .CodeMirror-sizer { margin-left: 0px !important; }
  .CodeMirror-gutters { display: none !important; }
}
#write li > figure:last-child { margin-bottom: 0.5rem; }
#write ol, #write ul { position: relative; }
img { max-width: 100%; vertical-align: middle; image-orientation: from-image; }
button, input, select, textarea { color: inherit; font: inherit; }
input[type="checkbox"], input[type="radio"] { line-height: normal; padding: 0px; }
*, ::after, ::before { box-sizing: border-box; }
#write h1, #write h2, #write h3, #write h4, #write h5, #write h6, #write p, #write pre { width: inherit; }
#write h1, #write h2, #write h3, #write h4, #write h5, #write h6, #write p { position: relative; }
p { line-height: inherit; }
h1, h2, h3, h4, h5, h6 { break-after: avoid-page; break-inside: avoid; orphans: 4; }
p { orphans: 4; }
h1 { font-size: 2rem; }
h2 { font-size: 1.8rem; }
h3 { font-size: 1.6rem; }
h4 { font-size: 1.4rem; }
h5 { font-size: 1.2rem; }
h6 { font-size: 1rem; }
.md-math-block, .md-rawblock, h1, h2, h3, h4, h5, h6, p { margin-top: 1rem; margin-bottom: 1rem; }
.hidden { display: none; }
.md-blockmeta { color: rgb(204, 204, 204); font-weight: 700; font-style: italic; }
a { cursor: pointer; }
sup.md-footnote { padding: 2px 4px; background-color: rgba(238, 238, 238, 0.7); color: rgb(85, 85, 85); border-radius: 4px; cursor: pointer; }
sup.md-footnote a, sup.md-footnote a:hover { color: inherit; text-transform: inherit; text-decoration: inherit; }
#write input[type="checkbox"] { cursor: pointer; width: inherit; height: inherit; }
figure { overflow-x: auto; margin: 1.2em 0px; max-width: calc(100% + 16px); padding: 0px; }
figure > table { margin: 0px; }
tr { break-inside: avoid; break-after: auto; }
thead { display: table-header-group; }
table { border-collapse: collapse; border-spacing: 0px; width: 100%; overflow: auto; break-inside: auto; text-align: left; }
table.md-table td { min-width: 32px; }
.CodeMirror-gutters { border-right: 0px; background-color: inherit; }
.CodeMirror-linenumber { user-select: none; }
.CodeMirror { text-align: left; }
.CodeMirror-placeholder { opacity: 0.3; }
.CodeMirror pre { padding: 0px 4px; }
.CodeMirror-lines { padding: 0px; }
div.hr:focus { cursor: none; }
#write pre { white-space: pre-wrap; }
#write.fences-no-line-wrapping pre { white-space: pre; }
#write pre.ty-contain-cm { white-space: normal; }
.CodeMirror-gutters { margin-right: 4px; }
.md-fences { font-size: 0.9rem; display: block; break-inside: avoid; text-align: left; overflow: visible; white-space: pre; background: inherit; position: relative !important; }
.md-diagram-panel { width: 100%; margin-top: 10px; text-align: center; padding-top: 0px; padding-bottom: 8px; overflow-x: auto; }
#write .md-fences.mock-cm { white-space: pre-wrap; }
.md-fences.md-fences-with-lineno { padding-left: 0px; }
#write.fences-no-line-wrapping .md-fences.mock-cm { white-space: pre; overflow-x: auto; }
.md-fences.mock-cm.md-fences-with-lineno { padding-left: 8px; }
.CodeMirror-line, twitterwidget { break-inside: avoid; }
.footnotes { opacity: 0.8; font-size: 0.9rem; margin-top: 1em; margin-bottom: 1em; }
.footnotes + .footnotes { margin-top: 0px; }
.md-reset { margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: top; background: 0px 0px; text-decoration: none; text-shadow: none; float: none; position: static; width: auto; height: auto; white-space: nowrap; cursor: inherit; -webkit-tap-highlight-color: transparent; line-height: normal; font-weight: 400; text-align: left; box-sizing: content-box; direction: ltr; }
li div { padding-top: 0px; }
blockquote { margin: 1rem 0px; }
li .mathjax-block, li p { margin: 0.5rem 0px; }
li { margin: 0px; position: relative; }
blockquote > :last-child { margin-bottom: 0px; }
blockquote > :first-child, li > :first-child { margin-top: 0px; }
.footnotes-area { color: rgb(136, 136, 136); margin-top: 0.714rem; padding-bottom: 0.143rem; white-space: normal; }
#write .footnote-line { white-space: pre-wrap; }
@media print {
  body, html { border: 1px solid transparent; height: 99%; break-after: avoid; break-before: avoid; font-variant-ligatures: no-common-ligatures; }
  #write { margin-top: 0px; padding-top: 0px; border-color: transparent !important; }
  .typora-export * { -webkit-print-color-adjust: exact; }
  html.blink-to-pdf { font-size: 13px; }
  .typora-export #write { break-after: avoid; }
  .typora-export #write::after { height: 0px; }
  .is-mac table { break-inside: avoid; }
}
.footnote-line { margin-top: 0.714em; font-size: 0.7em; }
a img, img a { cursor: pointer; }
pre.md-meta-block { font-size: 0.8rem; min-height: 0.8rem; white-space: pre-wrap; background: rgb(204, 204, 204); display: block; overflow-x: hidden; }
p > .md-image:only-child:not(.md-img-error) img, p > img:only-child { display: block; margin: auto; }
#write.first-line-indent p > .md-image:only-child:not(.md-img-error) img { left: -2em; position: relative; }
p > .md-image:only-child { display: inline-block; width: 100%; }
#write .MathJax_Display { margin: 0.8em 0px 0px; }
.md-math-block { width: 100%; }
.md-math-block:not(:empty)::after { display: none; }
[contenteditable="true"]:active, [contenteditable="true"]:focus, [contenteditable="false"]:active, [contenteditable="false"]:focus { outline: 0px; box-shadow: none; }
.md-task-list-item { position: relative; list-style-type: none; }
.task-list-item.md-task-list-item { padding-left: 0px; }
.md-task-list-item > input { position: absolute; top: 0px; left: 0px; margin-left: -1.2em; margin-top: calc(1em - 10px); border: none; }
.math { font-size: 1rem; }
.md-toc { min-height: 3.58rem; position: relative; font-size: 0.9rem; border-radius: 10px; }
.md-toc-content { position: relative; margin-left: 0px; }
.md-toc-content::after, .md-toc::after { display: none; }
.md-toc-item { display: block; color: rgb(65, 131, 196); }
.md-toc-item a { text-decoration: none; }
.md-toc-inner:hover { text-decoration: underline; }
.md-toc-inner { display: inline-block; cursor: pointer; }
.md-toc-h1 .md-toc-inner { margin-left: 0px; font-weight: 700; }
.md-toc-h2 .md-toc-inner { margin-left: 2em; }
.md-toc-h3 .md-toc-inner { margin-left: 4em; }
.md-toc-h4 .md-toc-inner { margin-left: 6em; }
.md-toc-h5 .md-toc-inner { margin-left: 8em; }
.md-toc-h6 .md-toc-inner { margin-left: 10em; }
@media screen and (max-width: 48em) {
  .md-toc-h3 .md-toc-inner { margin-left: 3.5em; }
  .md-toc-h4 .md-toc-inner { margin-left: 5em; }
  .md-toc-h5 .md-toc-inner { margin-left: 6.5em; }
  .md-toc-h6 .md-toc-inner { margin-left: 8em; }
}
a.md-toc-inner { font-size: inherit; font-style: inherit; font-weight: inherit; line-height: inherit; }
.footnote-line a:not(.reversefootnote) { color: inherit; }
.md-attr { display: none; }
.md-fn-count::after { content: "."; }
code, pre, samp, tt { font-family: var(--monospace); }
kbd { margin: 0px 0.1em; padding: 0.1em 0.6em; font-size: 0.8em; color: rgb(36, 39, 41); background: rgb(255, 255, 255); border: 1px solid rgb(173, 179, 185); border-radius: 3px; box-shadow: rgba(12, 13, 14, 0.2) 0px 1px 0px, rgb(255, 255, 255) 0px 0px 0px 2px inset; white-space: nowrap; vertical-align: middle; }
.md-comment { color: rgb(162, 127, 3); opacity: 0.8; font-family: var(--monospace); }
code { text-align: left; vertical-align: initial; }
a.md-print-anchor { white-space: pre !important; border-width: initial !important; border-style: none !important; border-color: initial !important; display: inline-block !important; position: absolute !important; width: 1px !important; right: 0px !important; outline: 0px !important; background: 0px 0px !important; text-decoration: initial !important; text-shadow: initial !important; }
.md-inline-math .MathJax_SVG .noError { display: none !important; }
.html-for-mac .inline-math-svg .MathJax_SVG { vertical-align: 0.2px; }
.md-math-block .MathJax_SVG_Display { text-align: center; margin: 0px; position: relative; text-indent: 0px; max-width: none; max-height: none; min-height: 0px; min-width: 100%; width: auto; overflow-y: hidden; display: block !important; }
.MathJax_SVG_Display, .md-inline-math .MathJax_SVG_Display { width: auto; margin: inherit; display: inline-block !important; }
.MathJax_SVG .MJX-monospace { font-family: var(--monospace); }
.MathJax_SVG .MJX-sans-serif { font-family: sans-serif; }
.MathJax_SVG { display: inline; font-style: normal; font-weight: 400; line-height: normal; zoom: 90%; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; }
.MathJax_SVG * { transition: none 0s ease 0s; }
.MathJax_SVG_Display svg { vertical-align: middle !important; margin-bottom: 0px !important; margin-top: 0px !important; }
.os-windows.monocolor-emoji .md-emoji { font-family: "Segoe UI Symbol", sans-serif; }
.md-diagram-panel > svg { max-width: 100%; }
[lang="flow"] svg, [lang="mermaid"] svg { max-width: 100%; height: auto; }
[lang="mermaid"] .node text { font-size: 1rem; }
table tr th { border-bottom: 0px; }
video { max-width: 100%; display: block; margin: 0px auto; }
iframe { max-width: 100%; width: 100%; border: none; }
.highlight td, .highlight tr { border: 0px; }
svg[id^="mermaidChart"] { line-height: 1em; }
mark { background: rgb(255, 255, 0); color: rgb(0, 0, 0); }
.md-html-inline .md-plain, .md-html-inline strong, mark .md-inline-math, mark strong { color: inherit; }
mark .md-meta { color: rgb(0, 0, 0); opacity: 0.3 !important; }
@media print {
  .typora-export h1, .typora-export h2, .typora-export h3, .typora-export h4, .typora-export h5, .typora-export h6 { break-inside: avoid; }
}


.CodeMirror { height: auto; }
.CodeMirror.cm-s-inner { background: inherit; }
.CodeMirror-scroll { overflow: auto hidden; z-index: 3; }
.CodeMirror-gutter-filler, .CodeMirror-scrollbar-filler { background-color: rgb(255, 255, 255); }
.CodeMirror-gutters { border-right: 1px solid rgb(221, 221, 221); background: inherit; white-space: nowrap; }
.CodeMirror-linenumber { padding: 0px 3px 0px 5px; text-align: right; color: rgb(153, 153, 153); }
.cm-s-inner .cm-keyword { color: rgb(119, 0, 136); }
.cm-s-inner .cm-atom, .cm-s-inner.cm-atom { color: rgb(34, 17, 153); }
.cm-s-inner .cm-number { color: rgb(17, 102, 68); }
.cm-s-inner .cm-def { color: rgb(0, 0, 255); }
.cm-s-inner .cm-variable { color: rgb(0, 0, 0); }
.cm-s-inner .cm-variable-2 { color: rgb(0, 85, 170); }
.cm-s-inner .cm-variable-3 { color: rgb(0, 136, 85); }
.cm-s-inner .cm-string { color: rgb(170, 17, 17); }
.cm-s-inner .cm-property { color: rgb(0, 0, 0); }
.cm-s-inner .cm-operator { color: rgb(152, 26, 26); }
.cm-s-inner .cm-comment, .cm-s-inner.cm-comment { color: rgb(170, 85, 0); }
.cm-s-inner .cm-string-2 { color: rgb(255, 85, 0); }
.cm-s-inner .cm-meta { color: rgb(85, 85, 85); }
.cm-s-inner .cm-qualifier { color: rgb(85, 85, 85); }
.cm-s-inner .cm-builtin { color: rgb(51, 0, 170); }
.cm-s-inner .cm-bracket { color: rgb(153, 153, 119); }
.cm-s-inner .cm-tag { color: rgb(17, 119, 0); }
.cm-s-inner .cm-attribute { color: rgb(0, 0, 204); }
.cm-s-inner .cm-header, .cm-s-inner.cm-header { color: rgb(0, 0, 255); }
.cm-s-inner .cm-quote, .cm-s-inner.cm-quote { color: rgb(0, 153, 0); }
.cm-s-inner .cm-hr, .cm-s-inner.cm-hr { color: rgb(153, 153, 153); }
.cm-s-inner .cm-link, .cm-s-inner.cm-link { color: rgb(0, 0, 204); }
.cm-negative { color: rgb(221, 68, 68); }
.cm-positive { color: rgb(34, 153, 34); }
.cm-header, .cm-strong { font-weight: 700; }
.cm-del { text-decoration: line-through; }
.cm-em { font-style: italic; }
.cm-link { text-decoration: underline; }
.cm-error { color: red; }
.cm-invalidchar { color: red; }
.cm-constant { color: rgb(38, 139, 210); }
.cm-defined { color: rgb(181, 137, 0); }
div.CodeMirror span.CodeMirror-matchingbracket { color: rgb(0, 255, 0); }
div.CodeMirror span.CodeMirror-nonmatchingbracket { color: rgb(255, 34, 34); }
.cm-s-inner .CodeMirror-activeline-background { background: inherit; }
.CodeMirror { position: relative; overflow: hidden; }
.CodeMirror-scroll { height: 100%; outline: 0px; position: relative; box-sizing: content-box; background: inherit; }
.CodeMirror-sizer { position: relative; }
.CodeMirror-gutter-filler, .CodeMirror-hscrollbar, .CodeMirror-scrollbar-filler, .CodeMirror-vscrollbar { position: absolute; z-index: 6; display: none; }
.CodeMirror-vscrollbar { right: 0px; top: 0px; overflow: hidden; }
.CodeMirror-hscrollbar { bottom: 0px; left: 0px; overflow: hidden; }
.CodeMirror-scrollbar-filler { right: 0px; bottom: 0px; }
.CodeMirror-gutter-filler { left: 0px; bottom: 0px; }
.CodeMirror-gutters { position: absolute; left: 0px; top: 0px; padding-bottom: 30px; z-index: 3; }
.CodeMirror-gutter { white-space: normal; height: 100%; box-sizing: content-box; padding-bottom: 30px; margin-bottom: -32px; display: inline-block; }
.CodeMirror-gutter-wrapper { position: absolute; z-index: 4; background: 0px 0px !important; border: none !important; }
.CodeMirror-gutter-background { position: absolute; top: 0px; bottom: 0px; z-index: 4; }
.CodeMirror-gutter-elt { position: absolute; cursor: default; z-index: 4; }
.CodeMirror-lines { cursor: text; }
.CodeMirror pre { border-radius: 0px; border-width: 0px; background: 0px 0px; font-family: inherit; font-size: inherit; margin: 0px; white-space: pre; overflow-wrap: normal; color: inherit; z-index: 2; position: relative; overflow: visible; }
.CodeMirror-wrap pre { overflow-wrap: break-word; white-space: pre-wrap; word-break: normal; }
.CodeMirror-code pre { border-right: 30px solid transparent; width: fit-content; }
.CodeMirror-wrap .CodeMirror-code pre { border-right: none; width: auto; }
.CodeMirror-linebackground { position: absolute; left: 0px; right: 0px; top: 0px; bottom: 0px; z-index: 0; }
.CodeMirror-linewidget { position: relative; z-index: 2; overflow: auto; }
.CodeMirror-wrap .CodeMirror-scroll { overflow-x: hidden; }
.CodeMirror-measure { position: absolute; width: 100%; height: 0px; overflow: hidden; visibility: hidden; }
.CodeMirror-measure pre { position: static; }
.CodeMirror div.CodeMirror-cursor { position: absolute; visibility: hidden; border-right: none; width: 0px; }
.CodeMirror div.CodeMirror-cursor { visibility: hidden; }
.CodeMirror-focused div.CodeMirror-cursor { visibility: inherit; }
.cm-searching { background: rgba(255, 255, 0, 0.4); }
@media print {
  .CodeMirror div.CodeMirror-cursor { visibility: hidden; }
}


/* Flowchart variables */
/* Sequence Diagram variables */
/* Gantt chart variables */
/* state colors */
.label {
  
  color: #333; }

.label text {
  fill: #333; }

.node rect,
.node circle,
.node ellipse,
.node polygon {
  fill: #BDD5EA;
  stroke: #9370DB;
  stroke-width: 1px; }

.node .label {
  text-align: center; }

.node.clickable {
  cursor: pointer; }

.arrowheadPath {
  fill: lightgrey; }

.edgePath .path {
  stroke: lightgrey;
  stroke-width: 1.5px; }

.edgeLabel {
  background-color: #e8e8e8;
  text-align: center; }

.cluster rect {
  fill: #6D6D65;
  stroke: rgba(255, 255, 255, 0.25);
  stroke-width: 1px; }

.cluster text {
  fill: #F9FFFE; }

div.mermaidTooltip {
  position: absolute;
  text-align: center;
  max-width: 200px;
  padding: 2px;
  
  font-size: 12px;
  background: #6D6D65;
  border: 1px solid rgba(255, 255, 255, 0.25);
  border-radius: 2px;
  pointer-events: none;
  z-index: 100; }

.actor {
  stroke: #81B1DB;
  fill: #BDD5EA; }

text.actor {
  fill: black;
  stroke: none; }

.actor-line {
  stroke: lightgrey; }

.messageLine0 {
  stroke-width: 1.5;
  stroke-dasharray: '2 2';
  stroke: lightgrey; }

.messageLine1 {
  stroke-width: 1.5;
  stroke-dasharray: '2 2';
  stroke: lightgrey; }

#arrowhead {
  fill: lightgrey; }

.sequenceNumber {
  fill: white; }

#sequencenumber {
  fill: lightgrey; }

#crosshead path {
  fill: lightgrey !important;
  stroke: lightgrey !important; }

.messageText {
  fill: lightgrey;
  stroke: none; }

.labelBox {
  stroke: #81B1DB;
  fill: #BDD5EA; }

.labelText {
  fill: #323D47;
  stroke: none; }

.loopText {
  fill: lightgrey;
  stroke: none; }

.loopLine {
  stroke-width: 2;
  stroke-dasharray: '2 2';
  stroke: #81B1DB; }

.note {
  stroke: rgba(255, 255, 255, 0.25);
  fill: #fff5ad; }

.noteText {
  fill: black;
  stroke: none;
  
  font-size: 14px; }

.activation0 {
  fill: #f4f4f4;
  stroke: #666; }

.activation1 {
  fill: #f4f4f4;
  stroke: #666; }

.activation2 {
  fill: #f4f4f4;
  stroke: #666; }

/** Section styling */
.section {
  stroke: none;
  opacity: 0.2; }

.section0 {
  fill: rgba(255, 255, 255, 0.3); }

.section2 {
  fill: #EAE8B9; }

.section1,
.section3 {
  fill: white;
  opacity: 0.2; }

.sectionTitle0 {
  fill: #F9FFFE; }

.sectionTitle1 {
  fill: #F9FFFE; }

.sectionTitle2 {
  fill: #F9FFFE; }

.sectionTitle3 {
  fill: #F9FFFE; }

.sectionTitle {
  text-anchor: start;
  font-size: 11px;
  text-height: 14px;
   }

/* Grid and axis */
.grid .tick {
  stroke: lightgrey;
  opacity: 0.3;
  shape-rendering: crispEdges; }

.grid path {
  stroke-width: 0; }

/* Today line */
.today {
  fill: none;
  stroke: #DB5757;
  stroke-width: 2px; }

/* Task styling */
/* Default task */
.task {
  stroke-width: 2; }

.taskText {
  text-anchor: middle;
   }

.taskText:not([font-size]) {
  font-size: 11px; }

.taskTextOutsideRight {
  fill: #323D47;
  text-anchor: start;
  font-size: 11px;
   }

.taskTextOutsideLeft {
  fill: #323D47;
  text-anchor: end;
  font-size: 11px; }

/* Special case clickable */
.task.clickable {
  cursor: pointer; }

.taskText.clickable {
  cursor: pointer;
  fill: #003163 !important;
  font-weight: bold; }

.taskTextOutsideLeft.clickable {
  cursor: pointer;
  fill: #003163 !important;
  font-weight: bold; }

.taskTextOutsideRight.clickable {
  cursor: pointer;
  fill: #003163 !important;
  font-weight: bold; }

/* Specific task settings for the sections*/
.taskText0,
.taskText1,
.taskText2,
.taskText3 {
  fill: #323D47; }

.task0,
.task1,
.task2,
.task3 {
  fill: #BDD5EA;
  stroke: rgba(255, 255, 255, 0.5); }

.taskTextOutside0,
.taskTextOutside2 {
  fill: lightgrey; }

.taskTextOutside1,
.taskTextOutside3 {
  fill: lightgrey; }

/* Active task */
.active0,
.active1,
.active2,
.active3 {
  fill: #81B1DB;
  stroke: rgba(255, 255, 255, 0.5); }

.activeText0,
.activeText1,
.activeText2,
.activeText3 {
  fill: #323D47 !important; }

/* Completed task */
.done0,
.done1,
.done2,
.done3 {
  stroke: grey;
  fill: lightgrey;
  stroke-width: 2; }

.doneText0,
.doneText1,
.doneText2,
.doneText3 {
  fill: #323D47 !important; }

/* Tasks on the critical line */
.crit0,
.crit1,
.crit2,
.crit3 {
  stroke: #E83737;
  fill: #E83737;
  stroke-width: 2; }

.activeCrit0,
.activeCrit1,
.activeCrit2,
.activeCrit3 {
  stroke: #E83737;
  fill: #81B1DB;
  stroke-width: 2; }

.doneCrit0,
.doneCrit1,
.doneCrit2,
.doneCrit3 {
  stroke: #E83737;
  fill: lightgrey;
  stroke-width: 2;
  cursor: pointer;
  shape-rendering: crispEdges; }

.milestone {
  transform: rotate(45deg) scale(0.8, 0.8); }

.milestoneText {
  font-style: italic; }

.doneCritText0,
.doneCritText1,
.doneCritText2,
.doneCritText3 {
  fill: #323D47 !important; }

.activeCritText0,
.activeCritText1,
.activeCritText2,
.activeCritText3 {
  fill: #323D47 !important; }

.titleText {
  text-anchor: middle;
  font-size: 18px;
  fill: #323D47;
   }

g.classGroup text {
  fill: #9370DB;
  stroke: none;
  
  font-size: 10px; }
  g.classGroup text .title {
    font-weight: bolder; }

g.classGroup rect {
  fill: #BDD5EA;
  stroke: #9370DB; }

g.classGroup line {
  stroke: #9370DB;
  stroke-width: 1; }

.classLabel .box {
  stroke: none;
  stroke-width: 0;
  fill: #BDD5EA;
  opacity: 0.5; }

.classLabel .label {
  fill: #9370DB;
  font-size: 10px; }

.relation {
  stroke: #9370DB;
  stroke-width: 1;
  fill: none; }

#compositionStart {
  fill: #9370DB;
  stroke: #9370DB;
  stroke-width: 1; }

#compositionEnd {
  fill: #9370DB;
  stroke: #9370DB;
  stroke-width: 1; }

#aggregationStart {
  fill: #BDD5EA;
  stroke: #9370DB;
  stroke-width: 1; }

#aggregationEnd {
  fill: #BDD5EA;
  stroke: #9370DB;
  stroke-width: 1; }

#dependencyStart {
  fill: #9370DB;
  stroke: #9370DB;
  stroke-width: 1; }

#dependencyEnd {
  fill: #9370DB;
  stroke: #9370DB;
  stroke-width: 1; }

#extensionStart {
  fill: #9370DB;
  stroke: #9370DB;
  stroke-width: 1; }

#extensionEnd {
  fill: #9370DB;
  stroke: #9370DB;
  stroke-width: 1; }

.commit-id,
.commit-msg,
.branch-label {
  fill: lightgrey;
  color: lightgrey;
   }

.pieTitleText {
  text-anchor: middle;
  font-size: 25px;
  fill: #eee;
}

g.stateGroup text {
  stroke: none;
  font-size: 10px;
}

g.stateGroup circle {
  fill: white !important;
  stroke: white !important;
}

g.stateGroup .state-title {
  font-weight: bolder;
  fill: black; }

g.stateGroup rect {
  fill: #ececff;
  stroke: #9370DB; }

g.stateGroup line {
  stroke: #9370DB;
  stroke-width: 1; }

.transition {
  stroke: #9370DB;
  stroke-width: 1;
  fill: none; }

.stateGroup .composit {
  fill: #555;
  border-bottom: 1px; }

.state-note {
  stroke: rgba(255, 255, 255, 0.25);
  fill: #fff5ad; }
  .state-note text {
    fill: black;
    stroke: none;
    font-size: 10px; }

.stateLabel .box {
  stroke: none;
  stroke-width: 0;
  fill: #BDD5EA;
  opacity: 0.5; }

.stateLabel text {
  fill: black;
  font-size: 10px;
  font-weight: bold;
}

.cluster-label {
  color:black;
}

.statediagram-cluster rect {
  fill: #BDD5EA;
  stroke: #9370DB; 
  stroke-width: 1px;
}
.statediagram-cluster rect.outer {
  rx: 5px;
  ry: 5px;
}
.statediagram-state .divider {
  stroke: #9370DB; 
}

.statediagram-state .title-state {
  rx: 5px;
  ry: 5px;
}
.statediagram-cluster.statediagram-cluster .inner {
  fill: white;
}
.statediagram-cluster.statediagram-cluster-alt .inner {
  fill: #e0e0e0;
}

.statediagram-cluster .inner {
  rx:0;
  ry:0;
}

.statediagram-state rect.basic {
  rx: 5px;
  ry: 5px;
}
.statediagram-state rect.divider {
  stroke-dasharray: 10,10;
  fill: #efefef;
}

.note-edge {
  stroke-dasharray: 5;
}

.statediagram-note rect {
  stroke: var(--cluster-border);
  fill: #fff5ad;
  stroke-width: 1px;
  rx: 0;
  ry: 0;
}

.node circle.state-start {
  fill: black;
  stroke: black;
}
.node circle.state-end {
  fill: black;
  stroke: white;
  stroke-width: 1.5
}
#statediagram-barbEnd {
  fill: #9370DB; 
}

/* CSS Document */

/** code highlight */

.cm-s-inner .cm-variable,
.cm-s-inner .cm-operator,
.cm-s-inner .cm-property {
    color: #b8bfc6;
}

.cm-s-inner .cm-keyword {
    color: #C88FD0;
}

.cm-s-inner .cm-tag {
    color: #7DF46A;
}

.cm-s-inner .cm-attribute {
    color: #7575E4;
}

.CodeMirror div.CodeMirror-cursor {
    border-left: 1px solid #b8bfc6;
    z-index: 3;
}

.cm-s-inner .cm-string {
    color: #D26B6B;
}

.cm-s-inner .cm-comment,
.cm-s-inner.cm-comment {
    color: #DA924A;
}

.cm-s-inner .cm-header,
.cm-s-inner .cm-def,
.cm-s-inner.cm-header,
.cm-s-inner.cm-def {
    color: #8d8df0;
}

.cm-s-inner .cm-quote,
.cm-s-inner.cm-quote {
    color: #57ac57;
}

.cm-s-inner .cm-hr {
    color: #d8d5d5;
}

.cm-s-inner .cm-link {
    color: #d3d3ef;
}

.cm-s-inner .cm-negative {
    color: #d95050;
}

.cm-s-inner .cm-positive {
    color: #50e650;
}

.cm-s-inner .cm-string-2 {
    color: #f50;
}

.cm-s-inner .cm-meta,
.cm-s-inner .cm-qualifier {
    color: #b7b3b3;
}

.cm-s-inner .cm-builtin {
    color: #f3b3f8;
}

.cm-s-inner .cm-bracket {
    color: #997;
}

.cm-s-inner .cm-atom,
.cm-s-inner.cm-atom {
    color: #84B6CB;
}

.cm-s-inner .cm-number {
    color: #64AB8F;
}

.cm-s-inner .cm-variable {
    color: #b8bfc6;
}

.cm-s-inner .cm-variable-2 {
    color: #9FBAD5;
}

.cm-s-inner .cm-variable-3 {
    color: #1cc685;
}

.CodeMirror-selectedtext,
.CodeMirror-selected {
    background: #4a89dc;
    color: #fff !important;
    text-shadow: none;
}

.CodeMirror-gutters {
    border-right: none;
}

/* CSS Document */

/** markdown source **/
.cm-s-typora-default .cm-header, 
.cm-s-typora-default .cm-property
{
    color: #cebcca;
}

.CodeMirror.cm-s-typora-default div.CodeMirror-cursor{
    border-left: 3px solid #b8bfc6;
}

.cm-s-typora-default .cm-comment {
    color: #9FB1FF;
}

.cm-s-typora-default .cm-string {
    color: #A7A7D9
}

.cm-s-typora-default .cm-atom, .cm-s-typora-default .cm-number {
    color: #848695;
    font-style: italic;
}

.cm-s-typora-default .cm-link {
    color: #95B94B;
}

.cm-s-typora-default .CodeMirror-activeline-background {
    background: rgba(51, 51, 51, 0.72);
}

.cm-s-typora-default .cm-comment, .cm-s-typora-default .cm-code {
	color: #8aa1e1;
}@import "";
@import "";
@import "";

:root {
    --bg-color:  #363B40;
    --side-bar-bg-color: #2E3033;
    --text-color: #b8bfc6;

    --select-text-bg-color:#4a89dc;

    --item-hover-bg-color: #0a0d16;
    --control-text-color: #b7b7b7;
    --control-text-hover-color: #eee;
    --window-border: 1px solid #555;

    --active-file-bg-color: rgb(34, 34, 34);
    --active-file-border-color: #8d8df0;

    --primary-color: #a3d5fe;

    --active-file-text-color: white;
    --item-hover-bg-color: #70717d;
    --item-hover-text-color: white;
    --primary-color: #6dc1e7;

    --rawblock-edit-panel-bd: #333;

    --search-select-bg-color: #428bca;
}

html {
    font-size: 16px;
}

html,
body {
    -webkit-text-size-adjust: 100%;
    -ms-text-size-adjust: 100%;
    background: #363B40;
    background: var(--bg-color);
    fill: currentColor;
    line-height: 1.625rem;
}

#write {
    max-width: 1080px;
}


@media only screen and (min-width: 1400px) {
	#write {
		max-width: 1024px;
	}
}

@media only screen and (min-width: 1800px) {
	#write {
		max-width: 1200px;
	}
}

html,
body,
button,
input,
select,
textarea,
div.code-tooltip-content {
    color: #b8bfc6;
    border-color: transparent;
}

div.code-tooltip,
.md-hover-tip .md-arrow:after {
    background: #333;
}

.popover.bottom > .arrow:after {
    border-bottom-color: #333;
}

html,
body,
button,
input,
select,
textarea {
    font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
}

hr {
    height: 2px;
    border: 0;
    margin: 24px 0 !important;
}

h1,
h2,
h3,
h4,
h5,
h6 {
    font-family: "Lucida Grande", "Corbel", sans-serif;
    font-weight: normal;
    clear: both;
    -ms-word-wrap: break-word;
    word-wrap: break-word;
    margin: 0;
    padding: 0;
    color: #DEDEDE
}

h1 {
    font-size: 2.5rem;
    /* 36px */
    line-height: 2.75rem;
    /* 40px */
    margin-bottom: 1.5rem;
    /* 24px */
    letter-spacing: -1.5px;
}

h2 {
    font-size: 1.63rem;
    /* 24px */
    line-height: 1.875rem;
    /* 30px */
    margin-bottom: 1.5rem;
    /* 24px */
    letter-spacing: -1px;
    font-weight: bold;
}

h3 {
    font-size: 1.17rem;
    /* 18px */
    line-height: 1.5rem;
    /* 24px */
    margin-bottom: 1.5rem;
    /* 24px */
    letter-spacing: -1px;
    font-weight: bold;
}

h4 {
    font-size: 1.12rem;
    /* 16px */
    line-height: 1.375rem;
    /* 22px */
    margin-bottom: 1.5rem;
    /* 24px */
    color: white;
}

h5 {
    font-size: 0.97rem;
    /* 16px */
    line-height: 1.25rem;
    /* 22px */
    margin-bottom: 1.5rem;
    /* 24px */
    font-weight: bold;
}

h6 {
    font-size: 0.93rem;
    /* 16px */
    line-height: 1rem;
    /* 16px */
    margin-bottom: 0.75rem;
    color: white;
}

@media (min-width: 980px) {
    h3.md-focus:before,
    h4.md-focus:before,
    h5.md-focus:before,
    h6.md-focus:before {
        color: #ddd;
        border: 1px solid #ddd;
        border-radius: 3px;
        position: absolute;
        left: -1.642857143rem;
        top: .357142857rem;
        float: left;
        font-size: 9px;
        padding-left: 2px;
        padding-right: 2px;
        vertical-align: bottom;
        font-weight: normal;
        line-height: normal;
    }

    h3.md-focus:before {
        content: 'h3';
    }

    h4.md-focus:before {
        content: 'h4';
    }

    h5.md-focus:before {
        content: 'h5';
        top: 0px;
    }

    h6.md-focus:before {
        content: 'h6';
        top: 0px;
    }
}

a {
    text-decoration: none;
    outline: 0;
}

a:hover {
    outline: 0;
}

a:focus {
    outline: thin dotted;
}

sup.md-footnote {
    background-color: #555;
    color: #ddd;
}

p {
    -ms-word-wrap: break-word;
    word-wrap: break-word;
}

p,
ul,
dd,
ol,
hr,
address,
pre,
table,
iframe,
.wp-caption,
.wp-audio-shortcode,
.wp-video-shortcode {
    margin-top: 0;
    margin-bottom: 1.5rem;
    /* 24px */
}

li > blockquote {
	margin-bottom: 0;
}

audio:not([controls]) {
    display: none;
}

[hidden] {
    display: none;
}

::-moz-selection {
    background: #4a89dc;
    color: #fff;
    text-shadow: none;
}

*.in-text-selection,
::selection {
    background: #4a89dc;
    color: #fff;
    text-shadow: none;
}

ul,
ol {
    padding: 0 0 0 1.875rem;
    /* 30px */
}

ul {
    list-style: square;
}

ol {
    list-style: decimal;
}

ul ul,
ol ol,
ul ol,
ol ul {
    margin: 0;
}

b,
th,
dt,
strong {
    font-weight: bold;
}

i,
em,
dfn,
cite {
    font-style: italic;
}

blockquote {
    padding-left: 1.875rem;
    margin: 0 0 1.875rem 1.875rem;
    border-left: solid 2px #474d54;
    padding-left: 30px;
    margin-top: 35px;
}

pre,
code,
kbd,
tt,
var {
    font-size: 0.875rem;
    font-family: Monaco, Consolas, "Andale Mono", "DejaVu Sans Mono", monospace;
}

code,
tt,
var {
    background: rgba(0, 0, 0, 0.05);
}

kbd {
    padding: 2px 4px;
    font-size: 90%;
    color: #fff;
    background-color: #333;
    border-radius: 3px;
    box-shadow: inset 0 -1px 0 rgba(0,0,0,.25);
}

pre.md-fences {
    padding: 10px 10px 10px 30px;
    margin-bottom: 20px;
    background: #333;
}

.CodeMirror-gutters {
    background: #333;
    border-right: 1px solid transparent;
}

.enable-diagrams pre.md-fences[lang="sequence"] .code-tooltip,
.enable-diagrams pre.md-fences[lang="flow"] .code-tooltip,
.enable-diagrams pre.md-fences[lang="mermaid"] .code-tooltip {
    bottom: -2.2em;
    right: 4px;
}

code,
kbd,
tt,
var {
    padding: 2px 5px;
}

table {
    max-width: 100%;
    width: 100%;
    border-collapse: collapse;
    border-spacing: 0;
}

th,
td {
    padding: 5px 10px;
    vertical-align: top;
}

a {
    -webkit-transition: all .2s ease-in-out;
    transition: all .2s ease-in-out;
}

hr {
    background: #474d54;
    /* variable */
}

h1 {
    margin-top: 2em;
}

a {
    color: #e0e0e0;
    text-decoration: underline;
}

a:hover {
    color: #fff;
}

.md-inline-math script {
    color: #81b1db;
}

b,
th,
dt,
strong {
    color: #DEDEDE;
    /* variable */
}

mark {
    background: #D3D40E;
}

blockquote {
    color: #9DA2A6;
}

table a {
    color: #DEDEDE;
    /* variable */
}

th,
td {
    border: solid 1px #474d54;
    /* variable */
}

.task-list {
    padding-left: 0;
}

.md-task-list-item {
    padding-left: 1.25rem;
}

.md-task-list-item > input {
    top: auto;
}

.md-task-list-item > input:before {
    content: "";
    display: inline-block;
    width: 0.875rem;
    height: 0.875rem;
    vertical-align: middle;
    text-align: center;
    border: 1px solid #b8bfc6;
    background-color: #363B40;
    margin-top: -0.4rem;
}

.md-task-list-item > input:checked:before,
.md-task-list-item > input[checked]:before {
    content: '\221A';
    /*◘*/
    font-size: 0.625rem;
    line-height: 0.625rem;
    color: #DEDEDE;
}

/** quick open **/
.auto-suggest-container {
    border: 0px;
    background-color: #525C65;
}

#typora-quick-open {
    background-color: #525C65;
}

#typora-quick-open input{
    background-color: #525C65;
    border: 0;
    border-bottom: 1px solid grey;
}

.typora-quick-open-item {
    background-color: inherit;
    color: inherit;
}

.typora-quick-open-item.active,
.typora-quick-open-item:hover {
    background-color: #4D8BDB;
    color: white;
}

.typora-quick-open-item:hover {
    background-color: rgba(77, 139, 219, 0.8);
}

.typora-search-spinner > div {
  background-color: #fff;
}

#write pre.md-meta-block {
    border-bottom: 1px dashed #ccc;
    background: transparent;
    padding-bottom: 0.6em;
    line-height: 1.6em;
}

.btn,
.btn .btn-default {
    background: transparent;
    color: #b8bfc6;
}

.ty-table-edit {
    border-top: 1px solid gray;
    background-color: #363B40;
}

.popover-title {
    background: transparent;
}

.md-image>.md-meta {
    color: #BBBBBB;
    background: transparent;
}

.md-expand.md-image>.md-meta {
    color: #DDD;
}

#write>h3:before,
#write>h4:before,
#write>h5:before,
#write>h6:before {
    border: none;
    border-radius: 0px;
    color: #888;
    text-decoration: underline;
    left: -1.4rem;
    top: 0.2rem;
}

#write>h3.md-focus:before {
    top: 2px;
}

#write>h4.md-focus:before {
    top: 2px;
}

.md-toc-item {
    color: #A8C2DC;
}

#write div.md-toc-tooltip {
    background-color: #363B40;
}

.dropdown-menu .btn:hover,
.dropdown-menu .btn:focus,
.md-toc .btn:hover,
.md-toc .btn:focus {
    color: white;
    background: black;
}

#toc-dropmenu {
    background: rgba(50, 54, 59, 0.93);
    border: 1px solid rgba(253, 253, 253, 0.15);
}

#toc-dropmenu .divider {
    background-color: #9b9b9b;
}

.outline-expander:before {
    top: 2px;
}

#typora-sidebar {
    box-shadow: none;
    border-right: 1px dashed;
    border-right: none;
}

.sidebar-tabs {
    border-bottom:0;
}

#typora-sidebar:hover .outline-title-wrapper {
    border-left: 1px dashed;
}

.outline-title-wrapper .btn {
    color: inherit;
}

.outline-item:hover {
    border-color: #363B40;
    background-color: #363B40;
    color: white;
}

h1.md-focus .md-attr,
h2.md-focus .md-attr,
h3.md-focus .md-attr,
h4.md-focus .md-attr,
h5.md-focus .md-attr,
h6.md-focus .md-attr,
.md-header-span .md-attr {
    color: #8C8E92;
    display: inline;
}

.md-comment {
    color: #5a95e3;
    opacity: 1;
}

.md-inline-math svg {
    color: #b8bfc6;
}

#math-inline-preview .md-arrow:after {
    background: black;
}

.modal-content {
    background: var(--bg-color);
    border: 0;
}

.modal-title {
    font-size: 1.5em;
}

.modal-content input {
    background-color: rgba(26, 21, 21, 0.51);
    color: white;
}

.modal-content .input-group-addon {
    color: white;
}

.modal-backdrop {
    background-color: rgba(174, 174, 174, 0.7);
}

.modal-content .btn-primary {
    border-color: var(--primary-color);
}

.md-table-resize-popover {
    background-color: #333;
}

.form-inline .input-group .input-group-addon {
    color: white;
}

#md-searchpanel {
    border-bottom: 1px dashed grey;
}

/** UI for electron */

.context-menu,
#spell-check-panel,
#footer-word-count-info {
    background-color: #42464A;
}

.context-menu.dropdown-menu .divider,
.dropdown-menu .divider {
    background-color: #777777;
}

footer {
    color: inherit;
}

@media (max-width: 1000px) {
    footer {
        border-top: none;
    }
    footer:hover {
        color: inherit;
    }
}

#file-info-file-path .file-info-field-value:hover {
    background-color: #555;
    color: #dedede;
}

.megamenu-content,
.megamenu-opened header {
    background: var(--bg-color);
}

.megamenu-menu-panel h2,
.megamenu-menu-panel h1,
.long-btn {
    color: inherit;
}

.megamenu-menu-panel input[type='text'] {
    background: inherit;
    border: 0;
    border-bottom: 1px solid;
}

#recent-file-panel-action-btn {
    background: inherit;
    border: 1px grey solid;
}

.megamenu-menu-panel .dropdown-menu > li > a {
    color: inherit;
    background-color: #2F353A;
    text-decoration: none;
}

.megamenu-menu-panel table td:nth-child(1) {
    color: inherit;
    font-weight: bold;
}

.megamenu-menu-panel tbody tr:hover td:nth-child(1) {
    color: white;
}

.modal-footer .btn-default, 
.modal-footer .btn-primary,
.modal-footer .btn-default:not(:hover) {
    border: 1px solid;
    border-color: transparent;
}

.btn-default:hover, .btn-default:focus, .btn-default.focus, .btn-default:active, .btn-default.active, .open > .dropdown-toggle.btn-default {
    color: white;
    border: 1px solid #ddd;
    background-color: inherit;
}

.modal-header {
    border-bottom: 0;
}

.modal-footer {
    border-top: 0;
}

#recent-file-panel tbody tr:nth-child(2n-1) {
    background-color: transparent !important;
}

.megamenu-menu-panel tbody tr:hover td:nth-child(2) {
    color: inherit;
}

.megamenu-menu-panel .btn {
    border: 1px solid #eee;
    background: transparent;
}

.mouse-hover .toolbar-icon.btn:hover,
#w-full.mouse-hover,
#w-pin.mouse-hover {
    background-color: inherit;
}

.typora-node::-webkit-scrollbar {
    width: 5px;
}

.typora-node::-webkit-scrollbar-thumb:vertical {
    background: rgba(250, 250, 250, 0.3);
}

.typora-node::-webkit-scrollbar-thumb:vertical:active {
    background: rgba(250, 250, 250, 0.5);
}

#w-unpin {
    background-color: #4182c4;
}

#top-titlebar, #top-titlebar * {
    color: var(--item-hover-text-color);
}

.typora-sourceview-on #toggle-sourceview-btn,
#footer-word-count:hover,
.ty-show-word-count #footer-word-count {
    background: #333333;
}

#toggle-sourceview-btn:hover {
    color: #eee;
    background: #333333;
}

/** focus mode */
.on-focus-mode .md-end-block:not(.md-focus):not(.md-focus-container) * {
    color: #686868 !important;
}

.on-focus-mode .md-end-block:not(.md-focus) img,
.on-focus-mode .md-task-list-item:not(.md-focus-container)>input {
    opacity: #686868 !important;
}

.on-focus-mode li[cid]:not(.md-focus-container){
    color: #686868;
}

.on-focus-mode .md-fences.md-focus .CodeMirror-code>*:not(.CodeMirror-activeline) *,
.on-focus-mode .CodeMirror.cm-s-inner:not(.CodeMirror-focused) * {
    color: #686868 !important;
}

.on-focus-mode .md-focus,
.on-focus-mode .md-focus-container {
    color: #fff;
}

.on-focus-mode #typora-source .CodeMirror-code>*:not(.CodeMirror-activeline) * {
    color: #686868 !important;
}


/*diagrams*/
#write .md-focus .md-diagram-panel {
    border: 1px solid #ddd;
    margin-left: -1px;
    width: calc(100% + 2px);
}

/*diagrams*/
#write .md-focus.md-fences-with-lineno .md-diagram-panel {
    margin-left: auto;
}

.md-diagram-panel-error {
    color: #f1908e;
}

.active-tab-files #info-panel-tab-file,
.active-tab-files #info-panel-tab-file:hover,
.active-tab-outline #info-panel-tab-outline,
.active-tab-outline #info-panel-tab-outline:hover {
    color: #eee;
}

.sidebar-footer-item:hover,
.footer-item:hover {
    background: inherit;
    color: white;
}

.ty-side-sort-btn.active,
.ty-side-sort-btn:hover,
.selected-folder-menu-item a:after {
    color: white;
}

#sidebar-files-menu {
    border:solid 1px;
    box-shadow: 4px 4px 20px rgba(0, 0, 0, 0.79);
    background-color: var(--bg-color);
}

.file-list-item {
    border-bottom:none;
}

.file-list-item-summary {
    opacity: 1;
}

.file-list-item.active:first-child {
    border-top: none;
}

.file-node-background {
    height: 32px;
}

.file-library-node.active>.file-node-content,
.file-list-item.active {
    color: white;
    color: var(--active-file-text-color);
}

.file-library-node.active>.file-node-background{
    background-color: rgb(34, 34, 34);
    background-color: var(--active-file-bg-color);
}
.file-list-item.active {
    background-color: rgb(34, 34, 34);
    background-color: var(--active-file-bg-color);
}

#ty-tooltip {
    background-color: black;
    color: #eee;
}

.md-task-list-item>input {
    margin-left: -1.3em;
    margin-top: 0.3rem;
    -webkit-appearance: none;
}

.md-mathjax-midline {
    background-color: #57616b;
    border-bottom: none;
}

footer.ty-footer {
    border-color: #656565;
}

.ty-preferences .btn-default {
    background: transparent;
}
.ty-preferences .btn-default:hover {
    background: #57616b;
}

.ty-preferences select {
    border: 1px solid #989698;
    height: 21px;
}

.ty-preferences .nav-group-item.active {
    background: var(--item-hover-bg-color);
}

.ty-preferences input[type="search"] {
    border-color: #333;
    background: #333;
    line-height: 22px;
    border-radius: 6px;
    color: white;
}

.ty-preferences input[type="search"]:focus {
    box-shadow: none;
}

[data-is-directory="true"] .file-node-content {
    margin-bottom: 0;
}

.file-node-title {
    line-height: 22px;
}

.html-for-mac .file-node-open-state, .html-for-mac .file-node-icon {
    line-height: 26px;
}

::-webkit-scrollbar-thumb {
    background: rgba(230, 230, 230, 0.30);
}

::-webkit-scrollbar-thumb:active {
    background: rgba(230, 230, 230, 0.50);
}

#typora-sidebar:hover div.sidebar-content-content::-webkit-scrollbar-thumb:horizontal {
    background: rgba(230, 230, 230, 0.30);
}

.nav-group-item:active {
    background-color: #474d54;
}

.md-search-hit {
    background: rgba(199, 140, 60, 0.81);
    color: #eee;
}

.md-search-hit * {
    color: #eee;
}

#md-searchpanel input {
    color: white;
}

.export-detail,
.export-item.active,
.export-items-list-control {
    background: #d6d6d4
}


</style>
</head>
<body class='typora-export os-windows'>
<div id='write'  class=''><h2><a name="第七章回合更新策略梯度方法" class="md-header-anchor"></a><span>第七章：回合更新策略梯度方法</span></h2><p><span>在前几章的算法中，求解最优策略都是试图估计最优价值函数，这些算法称为</span><strong><span>最优价值算法</span></strong><span>（optimal value algorithm）。本章开始介绍试图用含参函数近似最优策略，并通过迭代更新参数值，这类算法称为</span><strong><span>策略梯度算法</span></strong><span>（optimal gradient algorithm）。</span></p><h3><a name="一策略梯度算法的原理" class="md-header-anchor"></a><span>一、策略梯度算法的原理</span></h3><p><span>用函数近似方法估计最优策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="7.392ex" height="2.71ex" viewBox="0 -832.7 3182.6 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1935-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1935-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1935-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1935-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1935-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1935-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1935-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1935-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1935-MJMATHI-61" x="962" y="0"></use><use xlink:href="#E1935-MJMAIN-2223" x="1768" y="0"></use><use xlink:href="#E1935-MJMATHI-73" x="2324" y="0"></use><use xlink:href="#E1935-MJMAIN-29" x="2793" y="0"></use></g></svg></span><script type="math/tex">\pi(a \mid s)</script><span> 的基本思想是用含参函数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="9.514ex" height="2.71ex" viewBox="0 -832.7 4096.2 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1936-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1936-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1936-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1936-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1936-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1936-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1936-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1936-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1936-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1936-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1936-MJMATHI-61" x="962" y="0"></use><use xlink:href="#E1936-MJMAIN-2223" x="1768" y="0"></use><use xlink:href="#E1936-MJMATHI-73" x="2324" y="0"></use><use xlink:href="#E1936-MJMAIN-3B" x="2793" y="0"></use><use xlink:href="#E1936-MJMATHI-3B8" x="3238" y="0"></use><use xlink:href="#E1936-MJMAIN-29" x="3707" y="0"></use></g></svg></span><script type="math/tex">\pi(a \mid s; \Bbb\theta)</script><span> 来近似最优策略，由于任意策略都需要满足对于任意的状态 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.42ex" height="1.939ex" viewBox="0 -749.6 2333.6 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E1937-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1937-MJMAIN-2208" d="M84 250Q84 372 166 450T360 539Q361 539 377 539T419 540T469 540H568Q583 532 583 520Q583 511 570 501L466 500Q355 499 329 494Q280 482 242 458T183 409T147 354T129 306T124 272V270H568Q583 262 583 250T568 230H124V228Q124 207 134 177T167 112T231 48T328 7Q355 1 466 0H570Q583 -10 583 -20Q583 -32 568 -40H471Q464 -40 446 -40T417 -41Q262 -41 172 45Q84 127 84 250Z"></path><path stroke-width="0" id="E1937-MJCAL-53" d="M554 512Q536 512 536 522Q536 525 539 539T542 564Q542 588 528 604Q515 616 482 625T410 635Q374 635 349 624T312 594T295 561T290 532Q290 505 303 482T342 442T378 419T409 404Q435 391 451 383T494 357T535 323T562 282T574 231Q574 133 464 56T220 -22Q138 -22 78 21T18 123Q18 184 61 227T156 274Q178 274 178 263Q178 260 177 258Q172 247 164 239T151 227T136 218L127 213L124 202Q118 186 118 163Q120 124 165 86T292 48Q374 48 423 86T473 186V193Q473 267 347 327Q268 364 239 389Q191 431 191 486Q191 547 242 600T356 679T470 705Q472 705 478 705T489 704Q551 704 596 682T642 610Q642 566 621 545Q592 516 554 512Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1937-MJMATHI-73" x="0" y="0"></use><use xlink:href="#E1937-MJMAIN-2208" x="746" y="0"></use><use xlink:href="#E1937-MJCAL-53" x="1691" y="0"></use></g></svg></span><script type="math/tex">s \in \mathcal S</script><span> ，均有 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="15.391ex" height="5.122ex" viewBox="0 -998.8 6626.8 2205.2" role="img" focusable="false" style="vertical-align: -2.802ex;"><defs><path stroke-width="0" id="E1938-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1938-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1938-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1938-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1938-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1938-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1938-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1938-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1938-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1938-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1938-MJMATHI-61" x="756" y="-1485"></use><use xlink:href="#E1938-MJMATHI-3C0" x="1610" y="0"></use><use xlink:href="#E1938-MJMAIN-28" x="2183" y="0"></use><use xlink:href="#E1938-MJMATHI-61" x="2572" y="0"></use><use xlink:href="#E1938-MJMAIN-2223" x="3379" y="0"></use><use xlink:href="#E1938-MJMATHI-73" x="3935" y="0"></use><use xlink:href="#E1938-MJMAIN-29" x="4404" y="0"></use><use xlink:href="#E1938-MJMAIN-3D" x="5071" y="0"></use><use xlink:href="#E1938-MJMAIN-31" x="6126" y="0"></use></g></svg></span><script type="math/tex">\displaystyle \sum_a \pi(a \mid s) = 1</script><span> ，为此引入</span><strong><span>动作偏好函数</span></strong><span>（action preference function）</span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="8.618ex" height="2.71ex" viewBox="0 -832.7 3710.3 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1939-MJMATHI-68" d="M137 683Q138 683 209 688T282 694Q294 694 294 685Q294 674 258 534Q220 386 220 383Q220 381 227 388Q288 442 357 442Q411 442 444 415T478 336Q478 285 440 178T402 50Q403 36 407 31T422 26Q450 26 474 56T513 138Q516 149 519 151T535 153Q555 153 555 145Q555 144 551 130Q535 71 500 33Q466 -10 419 -10H414Q367 -10 346 17T325 74Q325 90 361 192T398 345Q398 404 354 404H349Q266 404 205 306L198 293L164 158Q132 28 127 16Q114 -11 83 -11Q69 -11 59 -2T48 16Q48 30 121 320L195 616Q195 629 188 632T149 637H128Q122 643 122 645T124 664Q129 683 137 683Z"></path><path stroke-width="0" id="E1939-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1939-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1939-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1939-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1939-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1939-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1939-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1939-MJMATHI-68" x="0" y="0"></use><use xlink:href="#E1939-MJMAIN-28" x="576" y="0"></use><use xlink:href="#E1939-MJMATHI-73" x="965" y="0"></use><use xlink:href="#E1939-MJMAIN-2C" x="1434" y="0"></use><use xlink:href="#E1939-MJMATHI-61" x="1878" y="0"></use><use xlink:href="#E1939-MJMAIN-3B" x="2407" y="0"></use><use xlink:href="#E1939-MJMATHI-3B8" x="2852" y="0"></use><use xlink:href="#E1939-MJMAIN-29" x="3321" y="0"></use></g></svg></span><script type="math/tex">h(s,a;\theta)</script><span> ，其 softmax 的值为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="9.514ex" height="2.71ex" viewBox="0 -832.7 4096.2 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1940-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1940-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1940-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1940-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1940-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1940-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1940-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1940-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1940-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1940-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1940-MJMATHI-61" x="962" y="0"></use><use xlink:href="#E1940-MJMAIN-2223" x="1768" y="0"></use><use xlink:href="#E1940-MJMATHI-73" x="2324" y="0"></use><use xlink:href="#E1940-MJMAIN-3B" x="2793" y="0"></use><use xlink:href="#E1940-MJMATHI-3B8" x="3238" y="0"></use><use xlink:href="#E1940-MJMAIN-29" x="3707" y="0"></use></g></svg></span><script type="math/tex">\pi(a \mid s; \theta)</script><span> ，即：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n5" cid="n5" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-819-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="98.296ex" height="5.99ex" viewBox="0 -1538.7 42321.7 2579" role="img" focusable="false" style="vertical-align: -2.241ex; margin-bottom: -0.175ex; max-width: 100%;"><defs><path stroke-width="0" id="E1921-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1921-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1921-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1921-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1921-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1921-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1921-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1921-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1921-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1921-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1921-MJMAIN-65" d="M28 218Q28 273 48 318T98 391T163 433T229 448Q282 448 320 430T378 380T406 316T415 245Q415 238 408 231H126V216Q126 68 226 36Q246 30 270 30Q312 30 342 62Q359 79 369 104L379 128Q382 131 395 131H398Q415 131 415 121Q415 117 412 108Q393 53 349 21T250 -11Q155 -11 92 58T28 218ZM333 275Q322 403 238 411H236Q228 411 220 410T195 402T166 381T143 340T127 274V267H333V275Z"></path><path stroke-width="0" id="E1921-MJMAIN-78" d="M201 0Q189 3 102 3Q26 3 17 0H11V46H25Q48 47 67 52T96 61T121 78T139 96T160 122T180 150L226 210L168 288Q159 301 149 315T133 336T122 351T113 363T107 370T100 376T94 379T88 381T80 383Q74 383 44 385H16V431H23Q59 429 126 429Q219 429 229 431H237V385Q201 381 201 369Q201 367 211 353T239 315T268 274L272 270L297 304Q329 345 329 358Q329 364 327 369T322 376T317 380T310 384L307 385H302V431H309Q324 428 408 428Q487 428 493 431H499V385H492Q443 385 411 368Q394 360 377 341T312 257L296 236L358 151Q424 61 429 57T446 50Q464 46 499 46H516V0H510H502Q494 1 482 1T457 2T432 2T414 3Q403 3 377 3T327 1L304 0H295V46H298Q309 46 320 51T331 63Q331 65 291 120L250 175Q249 174 219 133T185 88Q181 83 181 74Q181 63 188 55T206 46Q208 46 208 23V0H201Z"></path><path stroke-width="0" id="E1921-MJMAIN-70" d="M36 -148H50Q89 -148 97 -134V-126Q97 -119 97 -107T97 -77T98 -38T98 6T98 55T98 106Q98 140 98 177T98 243T98 296T97 335T97 351Q94 370 83 376T38 385H20V408Q20 431 22 431L32 432Q42 433 61 434T98 436Q115 437 135 438T165 441T176 442H179V416L180 390L188 397Q247 441 326 441Q407 441 464 377T522 216Q522 115 457 52T310 -11Q242 -11 190 33L182 40V-45V-101Q182 -128 184 -134T195 -145Q216 -148 244 -148H260V-194H252L228 -193Q205 -192 178 -192T140 -191Q37 -191 28 -194H20V-148H36ZM424 218Q424 292 390 347T305 402Q234 402 182 337V98Q222 26 294 26Q345 26 384 80T424 218Z"></path><path stroke-width="0" id="E1921-MJMATHI-68" d="M137 683Q138 683 209 688T282 694Q294 694 294 685Q294 674 258 534Q220 386 220 383Q220 381 227 388Q288 442 357 442Q411 442 444 415T478 336Q478 285 440 178T402 50Q403 36 407 31T422 26Q450 26 474 56T513 138Q516 149 519 151T535 153Q555 153 555 145Q555 144 551 130Q535 71 500 33Q466 -10 419 -10H414Q367 -10 346 17T325 74Q325 90 361 192T398 345Q398 404 354 404H349Q266 404 205 306L198 293L164 158Q132 28 127 16Q114 -11 83 -11Q69 -11 59 -2T48 16Q48 30 121 320L195 616Q195 629 188 632T149 637H128Q122 643 122 645T124 664Q129 683 137 683Z"></path><path stroke-width="0" id="E1921-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1921-MJSZ1-2211" d="M61 748Q64 750 489 750H913L954 640Q965 609 976 579T993 533T999 516H979L959 517Q936 579 886 621T777 682Q724 700 655 705T436 710H319Q183 710 183 709Q186 706 348 484T511 259Q517 250 513 244L490 216Q466 188 420 134T330 27L149 -187Q149 -188 362 -188Q388 -188 436 -188T506 -189Q679 -189 778 -162T936 -43Q946 -27 959 6H999L913 -249L489 -250Q65 -250 62 -248Q56 -246 56 -239Q56 -234 118 -161Q186 -81 245 -11L428 206Q428 207 242 462L57 717L56 728Q56 744 61 748Z"></path><path stroke-width="0" id="E1921-MJMAIN-2032" d="M79 43Q73 43 52 49T30 61Q30 68 85 293T146 528Q161 560 198 560Q218 560 240 545T262 501Q262 496 260 486Q259 479 173 263T84 45T79 43Z"></path><path stroke-width="0" id="E1921-MJMAIN-2208" d="M84 250Q84 372 166 450T360 539Q361 539 377 539T419 540T469 540H568Q583 532 583 520Q583 511 570 501L466 500Q355 499 329 494Q280 482 242 458T183 409T147 354T129 306T124 272V270H568Q583 262 583 250T568 230H124V228Q124 207 134 177T167 112T231 48T328 7Q355 1 466 0H570Q583 -10 583 -20Q583 -32 568 -40H471Q464 -40 446 -40T417 -41Q262 -41 172 45Q84 127 84 250Z"></path><path stroke-width="0" id="E1921-MJCAL-53" d="M554 512Q536 512 536 522Q536 525 539 539T542 564Q542 588 528 604Q515 616 482 625T410 635Q374 635 349 624T312 594T295 561T290 532Q290 505 303 482T342 442T378 419T409 404Q435 391 451 383T494 357T535 323T562 282T574 231Q574 133 464 56T220 -22Q138 -22 78 21T18 123Q18 184 61 227T156 274Q178 274 178 263Q178 260 177 258Q172 247 164 239T151 227T136 218L127 213L124 202Q118 186 118 163Q120 124 165 86T292 48Q374 48 423 86T473 186V193Q473 267 347 327Q268 364 239 389Q191 431 191 486Q191 547 242 600T356 679T470 705Q472 705 478 705T489 704Q551 704 596 682T642 610Q642 566 621 545Q592 516 554 512Z"></path><path stroke-width="0" id="E1921-MJCAL-41" d="M576 668Q576 688 606 708T660 728Q676 728 675 712V571Q675 409 688 252Q696 122 720 57Q722 53 723 50T728 46T732 43T737 41T743 39L754 45Q788 61 803 61Q819 61 819 47Q818 43 814 35Q799 15 755 -7T675 -30Q659 -30 648 -25T630 -8T621 11T614 34Q603 77 599 106T594 146T591 160V163H460L329 164L316 145Q241 35 196 -7T119 -50T59 -24T30 43Q30 75 46 100T74 125Q81 125 83 120T88 104T96 84Q118 57 151 57Q189 57 277 182Q432 400 542 625L559 659H567Q574 659 575 660T576 668ZM584 249Q579 333 577 386T575 473T574 520V581L563 560Q497 426 412 290L372 228L370 224H371L383 228L393 232H586L584 249Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(41043,0)"><g id="mjx-eqn-1" transform="translate(0,21)"><use xlink:href="#E1921-MJMAIN-28"></use><use xlink:href="#E1921-MJMAIN-31" x="389" y="0"></use><use xlink:href="#E1921-MJMAIN-29" x="889" y="0"></use></g></g><g transform="translate(10604,0)"><g transform="translate(-19,0)"><g transform="translate(0,21)"><use xlink:href="#E1921-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1921-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1921-MJMATHI-61" x="962" y="0"></use><use xlink:href="#E1921-MJMAIN-2223" x="1768" y="0"></use><use xlink:href="#E1921-MJMATHI-73" x="2324" y="0"></use><use xlink:href="#E1921-MJMAIN-3B" x="2793" y="0"></use><use xlink:href="#E1921-MJMATHI-3B8" x="3238" y="0"></use><use xlink:href="#E1921-MJMAIN-29" x="3707" y="0"></use><use xlink:href="#E1921-MJMAIN-3D" x="4374" y="0"></use><g transform="translate(5152,0)"><g transform="translate(397,0)"><rect stroke="none" width="7724" height="60" x="0" y="220"></rect><g transform="translate(1159,693)"><use xlink:href="#E1921-MJMAIN-65"></use><use xlink:href="#E1921-MJMAIN-78" x="444" y="0"></use><use xlink:href="#E1921-MJMAIN-70" x="972" y="0"></use><use xlink:href="#E1921-MJMATHI-68" x="1694" y="0"></use><use xlink:href="#E1921-MJMAIN-28" x="2270" y="0"></use><use xlink:href="#E1921-MJMATHI-73" x="2659" y="0"></use><use xlink:href="#E1921-MJMAIN-2C" x="3128" y="0"></use><use xlink:href="#E1921-MJMATHI-61" x="3573" y="0"></use><use xlink:href="#E1921-MJMAIN-3B" x="4102" y="0"></use><use xlink:href="#E1921-MJMATHI-3B8" x="4547" y="0"></use><use xlink:href="#E1921-MJMAIN-29" x="5016" y="0"></use></g><g transform="translate(60,-694)"><use xlink:href="#E1921-MJSZ1-2211" x="0" y="0"></use><g transform="translate(1056,-286)"><use transform="scale(0.707)" xlink:href="#E1921-MJMATHI-61" x="0" y="0"></use><use transform="scale(0.5)" xlink:href="#E1921-MJMAIN-2032" x="748" y="408"></use></g><g transform="translate(1904,0)"><use xlink:href="#E1921-MJMAIN-65"></use><use xlink:href="#E1921-MJMAIN-78" x="444" y="0"></use><use xlink:href="#E1921-MJMAIN-70" x="972" y="0"></use></g><use xlink:href="#E1921-MJMATHI-68" x="3599" y="0"></use><use xlink:href="#E1921-MJMAIN-28" x="4175" y="0"></use><use xlink:href="#E1921-MJMATHI-73" x="4564" y="0"></use><use xlink:href="#E1921-MJMAIN-2C" x="5033" y="0"></use><g transform="translate(5478,0)"><use xlink:href="#E1921-MJMATHI-61" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1921-MJMAIN-2032" x="748" y="408"></use></g><use xlink:href="#E1921-MJMAIN-3B" x="6301" y="0"></use><use xlink:href="#E1921-MJMATHI-3B8" x="6746" y="0"></use><use xlink:href="#E1921-MJMAIN-29" x="7215" y="0"></use></g></g></g><use xlink:href="#E1921-MJMAIN-2C" x="13671" y="0"></use><use xlink:href="#E1921-MJMATHI-73" x="16116" y="0"></use><use xlink:href="#E1921-MJMAIN-2208" x="16863" y="0"></use><use xlink:href="#E1921-MJCAL-53" x="17808" y="0"></use><use xlink:href="#E1921-MJMAIN-2C" x="18450" y="0"></use><use xlink:href="#E1921-MJMATHI-61" x="18894" y="0"></use><use xlink:href="#E1921-MJMAIN-2208" x="19701" y="0"></use><use xlink:href="#E1921-MJCAL-41" x="20646" y="0"></use></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-819">\pi(a \mid s; \theta) = \frac{\exp h(s,a;\theta)}{\sum_{a'}\exp h(s,a';\theta)}\; , \qquad s \in \mathcal S, a \in \mathcal A</script></div></div><p><span>动作偏好函数可以具有线性组合、人工神经网络等多种形式，其参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 通常使用基于梯度的迭代算法更新，所以动作偏好函数往往需要对参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 可导，另外还需要知道期望回报对参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 的梯度，这样就能沿着梯度方向更新 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 而使得期望回报增大；而</span><strong><span>策略梯度定理</span></strong><span>（policy gradient theorem）给出了期望回报和策略梯度之间的关系，是策略梯度方法的基础。</span></p><p><span>在回合制任务中，策略梯度定理给出了策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="4.227ex" height="2.71ex" viewBox="0 -832.7 1820 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1996-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1996-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1996-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1996-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1996-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1996-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1996-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1996-MJMAIN-29" x="1431" y="0"></use></g></svg></span><script type="math/tex">\pi(\theta)</script><span> 的期望回报 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="9.106ex" height="2.903ex" viewBox="0 -832.7 3920.5 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1946-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1946-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1946-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1946-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1946-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1946-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1946-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1946-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1946-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1946-MJMATHI-45" x="0" y="0"></use><g transform="translate(738,-186)"><use transform="scale(0.707)" xlink:href="#E1946-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1946-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1946-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1946-MJMAIN-29" x="1431" y="0"></use></g><use xlink:href="#E1946-MJMAIN-5B" x="2124" y="0"></use><g transform="translate(2402,0)"><use xlink:href="#E1946-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1946-MJMAIN-30" x="1111" y="-213"></use></g><use xlink:href="#E1946-MJMAIN-5D" x="3642" y="0"></use></g></svg></span><script type="math/tex">E_{\pi(\theta)}[G_0]</script><span> 对策略参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 的梯度为：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n8" cid="n8" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-820-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="98.296ex" height="7.34ex" viewBox="0 -1829.4 42321.7 3160.5" role="img" focusable="false" style="vertical-align: -3.091ex; max-width: 100%;"><defs><path stroke-width="0" id="E1922-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1922-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1922-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1922-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1922-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1922-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1922-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1922-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1922-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1922-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1922-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1922-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1922-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1922-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1922-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1922-MJMAIN-221E" d="M55 217Q55 305 111 373T254 442Q342 442 419 381Q457 350 493 303L507 284L514 294Q618 442 747 442Q833 442 888 374T944 214Q944 128 889 59T743 -11Q657 -11 580 50Q542 81 506 128L492 147L485 137Q381 -11 252 -11Q166 -11 111 57T55 217ZM907 217Q907 285 869 341T761 397Q740 397 720 392T682 378T648 359T619 335T594 310T574 285T559 263T548 246L543 238L574 198Q605 158 622 138T664 94T714 61T765 51Q827 51 867 100T907 217ZM92 214Q92 145 131 89T239 33Q357 33 456 193L425 233Q364 312 334 337Q285 380 233 380Q171 380 132 331T92 214Z"></path><path stroke-width="0" id="E1922-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1922-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1922-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1922-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1922-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1922-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1922-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1922-MJSZ4-28" d="M758 -1237T758 -1240T752 -1249H736Q718 -1249 717 -1248Q711 -1245 672 -1199Q237 -706 237 251T672 1700Q697 1730 716 1749Q718 1750 735 1750H752Q758 1744 758 1741Q758 1737 740 1713T689 1644T619 1537T540 1380T463 1176Q348 802 348 251Q348 -242 441 -599T744 -1218Q758 -1237 758 -1240Z"></path><path stroke-width="0" id="E1922-MJSZ4-29" d="M33 1741Q33 1750 51 1750H60H65Q73 1750 81 1743T119 1700Q554 1207 554 251Q554 -707 119 -1199Q76 -1250 66 -1250Q65 -1250 62 -1250T56 -1249Q55 -1249 53 -1249T49 -1250Q33 -1250 33 -1239Q33 -1236 50 -1214T98 -1150T163 -1052T238 -910T311 -727Q443 -335 443 251Q443 402 436 532T405 831T339 1142T224 1438T50 1716Q33 1737 33 1741Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(41043,0)"><g id="mjx-eqn-eq:2"><use xlink:href="#E1922-MJMAIN-28"></use><use xlink:href="#E1922-MJMAIN-32" x="389" y="0"></use><use xlink:href="#E1922-MJMAIN-29" x="889" y="0"></use></g></g><g transform="translate(11621,0)"><g transform="translate(-19,0)"><use xlink:href="#E1922-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1922-MJMATHI-45" x="0" y="0"></use><g transform="translate(738,-186)"><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1922-MJMAIN-5B" x="2957" y="0"></use><g transform="translate(3235,0)"><use xlink:href="#E1922-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-30" x="1111" y="-213"></use></g><use xlink:href="#E1922-MJMAIN-5D" x="4475" y="0"></use><use xlink:href="#E1922-MJMAIN-3D" x="5031" y="0"></use><use xlink:href="#E1922-MJMATHI-45" x="6087" y="0"></use><g transform="translate(7017,0)"><use xlink:href="#E1922-MJSZ4-28"></use><g transform="translate(792,0)"><use xlink:href="#E1922-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(2402,0)"><use xlink:href="#E1922-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3308,0)"><use xlink:href="#E1922-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1922-MJMAIN-2207" x="4449" y="0"></use><g transform="translate(5449,0)"><use xlink:href="#E1922-MJMAIN-6C"></use><use xlink:href="#E1922-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1922-MJMATHI-3C0" x="6450" y="0"></use><use xlink:href="#E1922-MJMAIN-28" x="7023" y="0"></use><g transform="translate(7412,0)"><use xlink:href="#E1922-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1922-MJMAIN-2223" x="8795" y="0"></use><g transform="translate(9350,0)"><use xlink:href="#E1922-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1922-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1922-MJMAIN-3B" x="10319" y="0"></use><use xlink:href="#E1922-MJMATHI-3B8" x="10763" y="0"></use><use xlink:href="#E1922-MJMAIN-29" x="11232" y="0"></use><use xlink:href="#E1922-MJSZ4-29" x="11621" y="0"></use></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-820">\nabla E_{\pi(\theta)}[G_0] = E \left(\sum_{t=0}^{+\infty} \gamma^t G_t \nabla \ln \pi(A_t \mid S_t; \theta) \right)
\label{eq:2}</script></div></div><p><span>证明：对策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="4.227ex" height="2.71ex" viewBox="0 -832.7 1820 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1996-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1996-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1996-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1996-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1996-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1996-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1996-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1996-MJMAIN-29" x="1431" y="0"></use></g></svg></span><script type="math/tex">\pi(\theta)</script><span> 的 Bellman 期望方程分别求梯度，有：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n10" cid="n10" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-821-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="96.825ex" height="14.865ex" viewBox="0 -3449.2 41688.6 6400" role="img" focusable="false" style="vertical-align: -6.748ex; margin-bottom: -0.106ex; max-width: 100%;"><defs><path stroke-width="0" id="E1923-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1923-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1923-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1923-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1923-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1923-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1923-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1923-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1923-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1923-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1923-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1923-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1923-MJMATHI-71" d="M33 157Q33 258 109 349T280 441Q340 441 372 389Q373 390 377 395T388 406T404 418Q438 442 450 442Q454 442 457 439T460 434Q460 425 391 149Q320 -135 320 -139Q320 -147 365 -148H390Q396 -156 396 -157T393 -175Q389 -188 383 -194H370Q339 -192 262 -192Q234 -192 211 -192T174 -192T157 -193Q143 -193 143 -185Q143 -182 145 -170Q149 -154 152 -151T172 -148Q220 -148 230 -141Q238 -136 258 -53T279 32Q279 33 272 29Q224 -10 172 -10Q117 -10 75 30T33 157ZM352 326Q329 405 277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q233 26 290 98L298 109L352 326Z"></path><path stroke-width="0" id="E1923-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1923-MJSZ4-28" d="M758 -1237T758 -1240T752 -1249H736Q718 -1249 717 -1248Q711 -1245 672 -1199Q237 -706 237 251T672 1700Q697 1730 716 1749Q718 1750 735 1750H752Q758 1744 758 1741Q758 1737 740 1713T689 1644T619 1537T540 1380T463 1176Q348 802 348 251Q348 -242 441 -599T744 -1218Q758 -1237 758 -1240Z"></path><path stroke-width="0" id="E1923-MJSZ4-29" d="M33 1741Q33 1750 51 1750H60H65Q73 1750 81 1743T119 1700Q554 1207 554 251Q554 -707 119 -1199Q76 -1250 66 -1250Q65 -1250 62 -1250T56 -1249Q55 -1249 53 -1249T49 -1250Q33 -1250 33 -1239Q33 -1236 50 -1214T98 -1150T163 -1052T238 -910T311 -727Q443 -335 443 251Q443 402 436 532T405 831T339 1142T224 1438T50 1716Q33 1737 33 1741Z"></path><path stroke-width="0" id="E1923-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1923-MJMATHI-72" d="M21 287Q22 290 23 295T28 317T38 348T53 381T73 411T99 433T132 442Q161 442 183 430T214 408T225 388Q227 382 228 382T236 389Q284 441 347 441H350Q398 441 422 400Q430 381 430 363Q430 333 417 315T391 292T366 288Q346 288 334 299T322 328Q322 376 378 392Q356 405 342 405Q286 405 239 331Q229 315 224 298T190 165Q156 25 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 114 189T154 366Q154 405 128 405Q107 405 92 377T68 316T57 280Q55 278 41 278H27Q21 284 21 287Z"></path><path stroke-width="0" id="E1923-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1923-MJMAIN-2032" d="M79 43Q73 43 52 49T30 61Q30 68 85 293T146 528Q161 560 198 560Q218 560 240 545T262 501Q262 496 260 486Q259 479 173 263T84 45T79 43Z"></path><path stroke-width="0" id="E1923-MJMATHI-70" d="M23 287Q24 290 25 295T30 317T40 348T55 381T75 411T101 433T134 442Q209 442 230 378L240 387Q302 442 358 442Q423 442 460 395T497 281Q497 173 421 82T249 -10Q227 -10 210 -4Q199 1 187 11T168 28L161 36Q160 35 139 -51T118 -138Q118 -144 126 -145T163 -148H188Q194 -155 194 -157T191 -175Q188 -187 185 -190T172 -194Q170 -194 161 -194T127 -193T65 -192Q-5 -192 -24 -194H-32Q-39 -187 -39 -183Q-37 -156 -26 -148H-6Q28 -147 33 -136Q36 -130 94 103T155 350Q156 355 156 364Q156 405 131 405Q109 405 94 377T71 316T59 280Q57 278 43 278H29Q23 284 23 287ZM178 102Q200 26 252 26Q282 26 310 49T356 107Q374 141 392 215T411 325V331Q411 405 350 405Q339 405 328 402T306 393T286 380T269 365T254 350T243 336T235 326L232 322Q232 321 229 308T218 264T204 212Q178 106 178 102Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,1655)"><use xlink:href="#E1923-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1923-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="2704" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="3093" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="3562" y="0"></use><use xlink:href="#E1923-MJMAIN-3D" x="4229" y="0"></use><use xlink:href="#E1923-MJMAIN-2207" x="5285" y="0"></use><g transform="translate(6285,0)"><use xlink:href="#E1923-MJSZ4-28"></use><g transform="translate(792,0)"><use xlink:href="#E1923-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1923-MJMATHI-3C0" x="2402" y="0"></use><use xlink:href="#E1923-MJMAIN-28" x="2975" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="3364" y="0"></use><use xlink:href="#E1923-MJMAIN-2223" x="4171" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="4727" y="0"></use><use xlink:href="#E1923-MJMAIN-3B" x="5196" y="0"></use><use xlink:href="#E1923-MJMATHI-3B8" x="5640" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="6109" y="0"></use><g transform="translate(6498,0)"><use xlink:href="#E1923-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="8331" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="8720" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="9189" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="9634" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="10163" y="0"></use><use xlink:href="#E1923-MJSZ4-29" x="10552" y="0"></use></g><use xlink:href="#E1923-MJMAIN-3D" x="17907" y="0"></use><g transform="translate(18963,0)"><use xlink:href="#E1923-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(20573,0)"><use xlink:href="#E1923-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="22406" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="22795" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="23264" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="23709" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="24238" y="0"></use><use xlink:href="#E1923-MJMAIN-2207" x="24627" y="0"></use><use xlink:href="#E1923-MJMATHI-3C0" x="25460" y="0"></use><use xlink:href="#E1923-MJMAIN-28" x="26033" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="26422" y="0"></use><use xlink:href="#E1923-MJMAIN-2223" x="27229" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="27785" y="0"></use><use xlink:href="#E1923-MJMAIN-3B" x="28254" y="0"></use><use xlink:href="#E1923-MJMATHI-3B8" x="28698" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="29167" y="0"></use><use xlink:href="#E1923-MJMAIN-2B" x="29778" y="0"></use><g transform="translate(30779,0)"><use xlink:href="#E1923-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1923-MJMATHI-3C0" x="32389" y="0"></use><use xlink:href="#E1923-MJMAIN-28" x="32962" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="33351" y="0"></use><use xlink:href="#E1923-MJMAIN-2223" x="34158" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="34714" y="0"></use><use xlink:href="#E1923-MJMAIN-3B" x="35183" y="0"></use><use xlink:href="#E1923-MJMATHI-3B8" x="35628" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="36097" y="0"></use><use xlink:href="#E1923-MJMAIN-2207" x="36486" y="0"></use><g transform="translate(37319,0)"><use xlink:href="#E1923-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="39151" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="39540" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="40009" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="40454" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="40983" y="0"></use></g><g transform="translate(0,-1645)"><use xlink:href="#E1923-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1923-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="2665" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="3054" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="3523" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="3968" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="4497" y="0"></use><use xlink:href="#E1923-MJMAIN-3D" x="5164" y="0"></use><use xlink:href="#E1923-MJMAIN-2207" x="6220" y="0"></use><g transform="translate(7219,0)"><use xlink:href="#E1923-MJSZ4-28"></use><use xlink:href="#E1923-MJMATHI-72" x="792" y="0"></use><use xlink:href="#E1923-MJMAIN-28" x="1243" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="1632" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="2101" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="2545" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="3074" y="0"></use><use xlink:href="#E1923-MJMAIN-2B" x="3685" y="0"></use><use xlink:href="#E1923-MJMATHI-3B3" x="4686" y="0"></use><g transform="translate(5395,0)"><use xlink:href="#E1923-MJSZ2-2211" x="0" y="0"></use><g transform="translate(452,-1154)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.5)" xlink:href="#E1923-MJMAIN-2032" x="663" y="513"></use></g></g><use xlink:href="#E1923-MJMATHI-70" x="7006" y="0"></use><use xlink:href="#E1923-MJMAIN-28" x="7509" y="0"></use><g transform="translate(7898,0)"><use xlink:href="#E1923-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E1923-MJMAIN-2223" x="8939" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="9495" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="9964" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="10409" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="10938" y="0"></use><g transform="translate(11327,0)"><use xlink:href="#E1923-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="13199" y="0"></use><g transform="translate(13588,0)"><use xlink:href="#E1923-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E1923-MJMAIN-29" x="14351" y="0"></use><use xlink:href="#E1923-MJSZ4-29" x="14740" y="0"></use></g><use xlink:href="#E1923-MJMAIN-3D" x="23030" y="0"></use><use xlink:href="#E1923-MJMATHI-3B3" x="24085" y="0"></use><g transform="translate(24795,0)"><use xlink:href="#E1923-MJSZ2-2211" x="0" y="0"></use><g transform="translate(452,-1154)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.5)" xlink:href="#E1923-MJMAIN-2032" x="663" y="513"></use></g></g><use xlink:href="#E1923-MJMATHI-70" x="26406" y="0"></use><use xlink:href="#E1923-MJMAIN-28" x="26909" y="0"></use><g transform="translate(27298,0)"><use xlink:href="#E1923-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E1923-MJMAIN-2223" x="28339" y="0"></use><use xlink:href="#E1923-MJMATHI-73" x="28895" y="0"></use><use xlink:href="#E1923-MJMAIN-2C" x="29364" y="0"></use><use xlink:href="#E1923-MJMATHI-61" x="29808" y="0"></use><use xlink:href="#E1923-MJMAIN-29" x="30337" y="0"></use><use xlink:href="#E1923-MJMAIN-2207" x="30726" y="0"></use><g transform="translate(31559,0)"><use xlink:href="#E1923-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1923-MJMAIN-28" x="33431" y="0"></use><g transform="translate(33820,0)"><use xlink:href="#E1923-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1923-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E1923-MJMAIN-29" x="34584" y="0"></use></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-821">\begin{split}
&\nabla v_{\pi(\theta)}(s) = \nabla\left(\sum_{a}\pi(a \mid s;\theta)q_{\pi(\theta)}(s,a) \right) = \sum_a q_{\pi(\theta)}(s,a) \nabla \pi(a \mid s;\theta) + \sum_a \pi(a \mid s;\theta) \nabla q_{\pi(\theta)}(s,a) \\
&\nabla q_{\pi(\theta)}(s,a) = \nabla\left(r(s,a) + \gamma\sum_{s'}p(s' \mid s,a)v_{\pi(\theta)}(s') \right) = \gamma\sum_{s'}p(s' \mid s,a) \nabla v_{\pi(\theta)}(s') 
\end{split}</script></div></div><p><span>将 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="11.35ex" height="2.903ex" viewBox="0 -832.7 4886.6 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1949-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1949-MJMATHI-71" d="M33 157Q33 258 109 349T280 441Q340 441 372 389Q373 390 377 395T388 406T404 418Q438 442 450 442Q454 442 457 439T460 434Q460 425 391 149Q320 -135 320 -139Q320 -147 365 -148H390Q396 -156 396 -157T393 -175Q389 -188 383 -194H370Q339 -192 262 -192Q234 -192 211 -192T174 -192T157 -193Q143 -193 143 -185Q143 -182 145 -170Q149 -154 152 -151T172 -148Q220 -148 230 -141Q238 -136 258 -53T279 32Q279 33 272 29Q224 -10 172 -10Q117 -10 75 30T33 157ZM352 326Q329 405 277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q233 26 290 98L298 109L352 326Z"></path><path stroke-width="0" id="E1949-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1949-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1949-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1949-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1949-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1949-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1949-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1949-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1949-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1949-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1949-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1949-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1949-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1949-MJMAIN-28" x="2665" y="0"></use><use xlink:href="#E1949-MJMATHI-73" x="3054" y="0"></use><use xlink:href="#E1949-MJMAIN-2C" x="3523" y="0"></use><use xlink:href="#E1949-MJMATHI-61" x="3968" y="0"></use><use xlink:href="#E1949-MJMAIN-29" x="4497" y="0"></use></g></svg></span><script type="math/tex">\nabla q_{\pi(\theta)}(s,a)</script><span> 代入到 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="9.179ex" height="2.903ex" viewBox="0 -832.7 3951.9 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1951-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1951-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1951-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1951-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1951-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1951-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1951-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1951-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1951-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1951-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1951-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1951-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1951-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1951-MJMAIN-28" x="2704" y="0"></use><use xlink:href="#E1951-MJMATHI-73" x="3093" y="0"></use><use xlink:href="#E1951-MJMAIN-29" x="3562" y="0"></use></g></svg></span><script type="math/tex">\nabla v_{\pi(\theta)}(s)</script><span> 中，并对 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="9.179ex" height="2.903ex" viewBox="0 -832.7 3951.9 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1951-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1951-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1951-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1951-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1951-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1951-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1951-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1951-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1951-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1951-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1951-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1951-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1951-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1951-MJMAIN-28" x="2704" y="0"></use><use xlink:href="#E1951-MJMATHI-73" x="3093" y="0"></use><use xlink:href="#E1951-MJMAIN-29" x="3562" y="0"></use></g></svg></span><script type="math/tex">\nabla v_{\pi(\theta)}(s)</script><span> 求期望，有：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n12" cid="n12" mdtype="math_block"><div class="md-rawblock-container md-math-container" contenteditable="false" tabindex="-1">
						<div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-909-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="98.844ex" height="39.559ex" viewBox="0 -8765.4 42557.8 17032.3" role="img" focusable="false" style="vertical-align: -19.201ex; max-width: 100%;"><defs><path stroke-width="0" id="E2032-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E2032-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E2032-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E2032-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E2032-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2032-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2032-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2032-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E2032-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2032-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2032-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E2032-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E2032-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E2032-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E2032-MJMAIN-50" d="M130 622Q123 629 119 631T103 634T60 637H27V683H214Q237 683 276 683T331 684Q419 684 471 671T567 616Q624 563 624 489Q624 421 573 372T451 307Q429 302 328 301H234V181Q234 62 237 58Q245 47 304 46H337V0H326Q305 3 182 3Q47 3 38 0H27V46H60Q102 47 111 49T130 61V622ZM507 488Q507 514 506 528T500 564T483 597T450 620T397 635Q385 637 307 637H286Q237 637 234 628Q231 624 231 483V342H302H339Q390 342 423 349T481 382Q507 411 507 488Z"></path><path stroke-width="0" id="E2032-MJMAIN-72" d="M36 46H50Q89 46 97 60V68Q97 77 97 91T98 122T98 161T98 203Q98 234 98 269T98 328L97 351Q94 370 83 376T38 385H20V408Q20 431 22 431L32 432Q42 433 60 434T96 436Q112 437 131 438T160 441T171 442H174V373Q213 441 271 441H277Q322 441 343 419T364 373Q364 352 351 337T313 322Q288 322 276 338T263 372Q263 381 265 388T270 400T273 405Q271 407 250 401Q234 393 226 386Q179 341 179 207V154Q179 141 179 127T179 101T180 81T180 66V61Q181 59 183 57T188 54T193 51T200 49T207 48T216 47T225 47T235 46T245 46H276V0H267Q249 3 140 3Q37 3 28 0H20V46H36Z"></path><path stroke-width="0" id="E2032-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E2032-MJMATHI-71" d="M33 157Q33 258 109 349T280 441Q340 441 372 389Q373 390 377 395T388 406T404 418Q438 442 450 442Q454 442 457 439T460 434Q460 425 391 149Q320 -135 320 -139Q320 -147 365 -148H390Q396 -156 396 -157T393 -175Q389 -188 383 -194H370Q339 -192 262 -192Q234 -192 211 -192T174 -192T157 -193Q143 -193 143 -185Q143 -182 145 -170Q149 -154 152 -151T172 -148Q220 -148 230 -141Q238 -136 258 -53T279 32Q279 33 272 29Q224 -10 172 -10Q117 -10 75 30T33 157ZM352 326Q329 405 277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q233 26 290 98L298 109L352 326Z"></path><path stroke-width="0" id="E2032-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E2032-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2032-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2032-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E2032-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E2032-MJMAIN-2032" d="M79 43Q73 43 52 49T30 61Q30 68 85 293T146 528Q161 560 198 560Q218 560 240 545T262 501Q262 496 260 486Q259 479 173 263T84 45T79 43Z"></path><path stroke-width="0" id="E2032-MJMATHI-70" d="M23 287Q24 290 25 295T30 317T40 348T55 381T75 411T101 433T134 442Q209 442 230 378L240 387Q302 442 358 442Q423 442 460 395T497 281Q497 173 421 82T249 -10Q227 -10 210 -4Q199 1 187 11T168 28L161 36Q160 35 139 -51T118 -138Q118 -144 126 -145T163 -148H188Q194 -155 194 -157T191 -175Q188 -187 185 -190T172 -194Q170 -194 161 -194T127 -193T65 -192Q-5 -192 -24 -194H-32Q-39 -187 -39 -183Q-37 -156 -26 -148H-6Q28 -147 33 -136Q36 -130 94 103T155 350Q156 355 156 364Q156 405 131 405Q109 405 94 377T71 316T59 280Q57 278 43 278H29Q23 284 23 287ZM178 102Q200 26 252 26Q282 26 310 49T356 107Q374 141 392 215T411 325V331Q411 405 350 405Q339 405 328 402T306 393T286 380T269 365T254 350T243 336T235 326L232 322Q232 321 229 308T218 264T204 212Q178 106 178 102Z"></path><path stroke-width="0" id="E2032-MJSZ4-5B" d="M269 -1249V1750H577V1677H342V-1176H577V-1249H269Z"></path><path stroke-width="0" id="E2032-MJSZ4-5D" d="M5 1677V1750H313V-1249H5V-1176H240V1677H5Z"></path><path stroke-width="0" id="E2032-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,7771)"><use xlink:href="#E2032-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-5B" x="764" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="1042" y="0"></use><g transform="translate(1875,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="3746" y="0"></use><g transform="translate(4135,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-29" x="5104" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="5493" y="0"></use><use xlink:href="#E2032-MJMAIN-3D" x="6048" y="0"></use><g transform="translate(7104,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(8715,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="9788" y="0"></use><g transform="translate(10066,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="11312" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="12368" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="12837" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="13115" y="0"></use><g transform="translate(13948,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="15820" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="16209" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="16678" y="0"></use></g><g transform="translate(0,4565)"><use xlink:href="#E2032-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="4017" y="0"></use><g transform="translate(4295,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="5541" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="6597" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="7066" y="0"></use><g transform="translate(7510,0)"><use xlink:href="#E2032-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2193,0)"><use xlink:href="#E2032-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="4026" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="4415" y="0"></use><use xlink:href="#E2032-MJMAIN-2C" x="4884" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="5329" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="5858" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="6247" y="0"></use><use xlink:href="#E2032-MJMATHI-3C0" x="7080" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="7653" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="8042" y="0"></use><use xlink:href="#E2032-MJMAIN-2223" x="8849" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="9404" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="9873" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="10318" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="10787" y="0"></use><use xlink:href="#E2032-MJMAIN-2B" x="11398" y="0"></use><g transform="translate(12398,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E2032-MJMATHI-3C0" x="14009" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="14582" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="14971" y="0"></use><use xlink:href="#E2032-MJMAIN-2223" x="15778" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="16334" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="16803" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="17247" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="17716" y="0"></use><use xlink:href="#E2032-MJMATHI-3B3" x="18105" y="0"></use><g transform="translate(18815,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><g transform="translate(452,-1154)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.5)" xlink:href="#E2032-MJMAIN-2032" x="663" y="513"></use></g></g><use xlink:href="#E2032-MJMATHI-70" x="20426" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="20929" y="0"></use><g transform="translate(21318,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-2223" x="22359" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="22915" y="0"></use><use xlink:href="#E2032-MJMAIN-2C" x="23384" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="23828" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="24357" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="24746" y="0"></use><g transform="translate(25579,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="27451" y="0"></use><g transform="translate(27840,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-29" x="28604" y="0"></use><use xlink:href="#E2032-MJSZ4-5D" x="28993" y="-1"></use></g></g><g transform="translate(0,1255)"><use xlink:href="#E2032-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="4017" y="0"></use><g transform="translate(4295,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="5541" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="6597" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="7066" y="0"></use><g transform="translate(7510,0)"><use xlink:href="#E2032-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2193,0)"><use xlink:href="#E2032-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="4026" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="4415" y="0"></use><use xlink:href="#E2032-MJMAIN-2C" x="4884" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="5329" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="5858" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="6247" y="0"></use><use xlink:href="#E2032-MJMATHI-3C0" x="7080" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="7653" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="8042" y="0"></use><use xlink:href="#E2032-MJMAIN-2223" x="8849" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="9404" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="9873" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="10318" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="10787" y="0"></use><use xlink:href="#E2032-MJMAIN-2B" x="11398" y="0"></use><g transform="translate(12398,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><g transform="translate(452,-1154)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.5)" xlink:href="#E2032-MJMAIN-2032" x="663" y="513"></use></g></g><g transform="translate(14009,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-28" x="15082" y="0"></use><g transform="translate(15471,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-31" x="1139" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-3D" x="17621" y="0"></use><g transform="translate(18677,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-2223" x="19718" y="0"></use><g transform="translate(20274,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="21520" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="22575" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="23044" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="23489" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="23958" y="0"></use><use xlink:href="#E2032-MJMATHI-3B3" x="24347" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="24890" y="0"></use><g transform="translate(25723,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="27595" y="0"></use><g transform="translate(27984,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-29" x="28747" y="0"></use><use xlink:href="#E2032-MJSZ4-5D" x="29136" y="-1"></use></g></g><g transform="translate(0,-1256)"><use xlink:href="#E2032-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="4017" y="0"></use><g transform="translate(4295,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="5541" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="6597" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="7066" y="0"></use><g transform="translate(7510,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(9121,0)"><use xlink:href="#E2032-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="10954" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="11343" y="0"></use><use xlink:href="#E2032-MJMAIN-2C" x="11812" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="12256" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="12785" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="13174" y="0"></use><use xlink:href="#E2032-MJMATHI-3C0" x="14007" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="14580" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="14969" y="0"></use><use xlink:href="#E2032-MJMAIN-2223" x="15776" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="16332" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="16801" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="17246" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="17715" y="0"></use><use xlink:href="#E2032-MJMAIN-2B" x="18326" y="0"></use><g transform="translate(19326,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(20937,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="22010" y="0"></use><g transform="translate(22288,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="23534" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="24590" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="25059" y="0"></use><g transform="translate(25503,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><g transform="translate(452,-1154)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.5)" xlink:href="#E2032-MJMAIN-2032" x="663" y="513"></use></g></g><g transform="translate(27114,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-28" x="28187" y="0"></use><g transform="translate(28576,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-31" x="1139" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-3D" x="30726" y="0"></use><g transform="translate(31781,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-2223" x="32823" y="0"></use><g transform="translate(33378,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="34625" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="35680" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="36149" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="36594" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="37063" y="0"></use><use xlink:href="#E2032-MJMATHI-3B3" x="37452" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="37995" y="0"></use><g transform="translate(38828,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="40700" y="0"></use><g transform="translate(41089,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-29" x="41852" y="0"></use></g><g transform="translate(0,-3766)"><use xlink:href="#E2032-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="4017" y="0"></use><g transform="translate(4295,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3D" x="5541" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="6597" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="7066" y="0"></use><g transform="translate(7510,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(9121,0)"><use xlink:href="#E2032-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="10954" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="11343" y="0"></use><use xlink:href="#E2032-MJMAIN-2C" x="11812" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="12256" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="12785" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="13174" y="0"></use><use xlink:href="#E2032-MJMATHI-3C0" x="14007" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="14580" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="14969" y="0"></use><use xlink:href="#E2032-MJMAIN-2223" x="15776" y="0"></use><use xlink:href="#E2032-MJMATHI-73" x="16332" y="0"></use><use xlink:href="#E2032-MJMAIN-3B" x="16801" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="17246" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="17715" y="0"></use><use xlink:href="#E2032-MJMAIN-2B" x="18326" y="0"></use><use xlink:href="#E2032-MJMATHI-3B3" x="19326" y="0"></use><g transform="translate(20036,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-73" x="786" y="-1485"></use></g><g transform="translate(21646,0)"><use xlink:href="#E2032-MJMAIN-50" x="0" y="0"></use><use xlink:href="#E2032-MJMAIN-72" x="681" y="0"></use></g><use xlink:href="#E2032-MJMAIN-5B" x="22719" y="0"></use><g transform="translate(22997,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-31" x="1139" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-3D" x="25147" y="0"></use><g transform="translate(26203,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-3B" x="26966" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="27411" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="27880" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="28158" y="0"></use><g transform="translate(28991,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="30863" y="0"></use><g transform="translate(31252,0)"><use xlink:href="#E2032-MJMATHI-73" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2032" x="663" y="583"></use></g><use xlink:href="#E2032-MJMAIN-29" x="32015" y="0"></use></g><g transform="translate(0,-6973)"><use xlink:href="#E2032-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E2032-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E2032-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E2032-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2193,0)"><use xlink:href="#E2032-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="4026" y="0"></use><g transform="translate(4415,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-2C" x="5383" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="5828" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="6357" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="6746" y="0"></use><use xlink:href="#E2032-MJMATHI-3C0" x="7579" y="0"></use><use xlink:href="#E2032-MJMAIN-28" x="8152" y="0"></use><use xlink:href="#E2032-MJMATHI-61" x="8541" y="0"></use><use xlink:href="#E2032-MJMAIN-2223" x="9348" y="0"></use><g transform="translate(9904,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2032-MJMAIN-3B" x="10872" y="0"></use><use xlink:href="#E2032-MJMATHI-3B8" x="11317" y="0"></use><use xlink:href="#E2032-MJMAIN-29" x="11786" y="0"></use><use xlink:href="#E2032-MJSZ4-5D" x="12175" y="-1"></use></g><use xlink:href="#E2032-MJMAIN-2B" x="15244" y="0"></use><use xlink:href="#E2032-MJMATHI-3B3" x="16244" y="0"></use><use xlink:href="#E2032-MJMATHI-45" x="16787" y="0"></use><use xlink:href="#E2032-MJMAIN-5B" x="17551" y="0"></use><use xlink:href="#E2032-MJMAIN-2207" x="17829" y="0"></use><g transform="translate(18662,0)"><use xlink:href="#E2032-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-28" x="20534" y="0"></use><g transform="translate(20923,0)"><use xlink:href="#E2032-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E2032-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E2032-MJMAIN-31" x="1139" y="0"></use></g></g><use xlink:href="#E2032-MJMAIN-29" x="22795" y="0"></use><use xlink:href="#E2032-MJMAIN-5D" x="23184" y="0"></use></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-909">\begin{split}
&E[\nabla v_{\pi(\theta)}(S_t)] = \sum_s {\rm Pr} [S_t=s] \nabla v_{\pi(\theta)}(s) \\
&= \sum_s {\rm Pr} [S_t=s] \left[\sum_a q_{\pi(\theta)}(s,a)\nabla\pi(a \mid s;\theta) + \sum_a \pi(a \mid s;\theta) \gamma\sum_{s'}p(s' \mid s,a) \nabla v_{\pi(\theta)}(s')  \right] \\
&= \sum_s {\rm Pr} [S_t=s] \left[\sum_a q_{\pi(\theta)}(s,a)\nabla\pi(a \mid s;\theta) + \sum_{s'} {\rm Pr} (S_{t+1}=s' \mid S_t=s;\theta) \gamma \nabla v_{\pi(\theta)}(s')  \right] \\
&= \sum_s {\rm Pr} [S_t=s] \sum_a q_{\pi(\theta)}(s,a)\nabla\pi(a \mid s;\theta) + \sum_s {\rm Pr} [S_t=s] \sum_{s'} {\rm Pr} (S_{t+1}=s' \mid S_t=s;\theta) \gamma \nabla v_{\pi(\theta)}(s') \\
&= \sum_s {\rm Pr} [S_t=s] \sum_a q_{\pi(\theta)}(s,a)\nabla\pi(a \mid s;\theta) + \gamma \sum_s {\rm Pr} [S_{t+1}=s';\theta] \nabla v_{\pi(\theta)}(s') \\
&=E\left[\sum_a q_{\pi(\theta)}(S_t,a) \nabla\pi(a \mid S_t;\theta) \right] + \gamma E[\nabla v_{\pi(\theta)}(S_{t+1})]
\end{split}</script>
					</div></div><p><span>这样就得到了从 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="13.404ex" height="2.903ex" viewBox="0 -832.7 5771.2 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1952-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1952-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1952-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1952-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1952-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1952-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1952-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1952-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1952-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1952-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1952-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1952-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1952-MJMAIN-5B" x="764" y="0"></use><use xlink:href="#E1952-MJMAIN-2207" x="1042" y="0"></use><g transform="translate(1875,0)"><use xlink:href="#E1952-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1952-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1952-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1952-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1952-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1952-MJMAIN-28" x="3746" y="0"></use><g transform="translate(4135,0)"><use xlink:href="#E1952-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1952-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1952-MJMAIN-29" x="5104" y="0"></use><use xlink:href="#E1952-MJMAIN-5D" x="5493" y="0"></use></g></svg></span><script type="math/tex">E[\nabla v_{\pi(\theta)}(S_t)]</script><span> 到 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="15.503ex" height="2.903ex" viewBox="0 -832.7 6674.9 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1953-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1953-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1953-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1953-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1953-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1953-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1953-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1953-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1953-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1953-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1953-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1953-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1953-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1953-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1953-MJMAIN-5B" x="764" y="0"></use><use xlink:href="#E1953-MJMAIN-2207" x="1042" y="0"></use><g transform="translate(1875,0)"><use xlink:href="#E1953-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1953-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1953-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1953-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1953-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1953-MJMAIN-28" x="3746" y="0"></use><g transform="translate(4135,0)"><use xlink:href="#E1953-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E1953-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1953-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1953-MJMAIN-31" x="1139" y="0"></use></g></g><use xlink:href="#E1953-MJMAIN-29" x="6007" y="0"></use><use xlink:href="#E1953-MJMAIN-5D" x="6396" y="0"></use></g></svg></span><script type="math/tex">E[\nabla v_{\pi(\theta)}(S_{t+1})]</script><span> 的递推式，再注意到最终关注的梯度值 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="11.04ex" height="2.903ex" viewBox="0 -832.7 4753.5 1250" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1954-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1954-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1954-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1954-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1954-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1954-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1954-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1954-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1954-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1954-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1954-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1954-MJMATHI-45" x="0" y="0"></use><g transform="translate(738,-186)"><use transform="scale(0.707)" xlink:href="#E1954-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1954-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1954-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1954-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1954-MJMAIN-5B" x="2957" y="0"></use><g transform="translate(3235,0)"><use xlink:href="#E1954-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1954-MJMAIN-30" x="1111" y="-213"></use></g><use xlink:href="#E1954-MJMAIN-5D" x="4475" y="0"></use></g></svg></span><script type="math/tex">\nabla E_{\pi(\theta)}[G_0]</script><span> ，有：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n14" cid="n14" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-823-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="65.894ex" height="66.955ex" viewBox="0 -14663 28371 28827.6" role="img" focusable="false" style="vertical-align: -32.899ex; max-width: 100%;"><defs><path stroke-width="0" id="E1925-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1925-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1925-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1925-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1925-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1925-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1925-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1925-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1925-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1925-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1925-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1925-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1925-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1925-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1925-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1925-MJMATHI-71" d="M33 157Q33 258 109 349T280 441Q340 441 372 389Q373 390 377 395T388 406T404 418Q438 442 450 442Q454 442 457 439T460 434Q460 425 391 149Q320 -135 320 -139Q320 -147 365 -148H390Q396 -156 396 -157T393 -175Q389 -188 383 -194H370Q339 -192 262 -192Q234 -192 211 -192T174 -192T157 -193Q143 -193 143 -185Q143 -182 145 -170Q149 -154 152 -151T172 -148Q220 -148 230 -141Q238 -136 258 -53T279 32Q279 33 272 29Q224 -10 172 -10Q117 -10 75 30T33 157ZM352 326Q329 405 277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q233 26 290 98L298 109L352 326Z"></path><path stroke-width="0" id="E1925-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1925-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1925-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1925-MJSZ4-5B" d="M269 -1249V1750H577V1677H342V-1176H577V-1249H269Z"></path><path stroke-width="0" id="E1925-MJSZ4-5D" d="M5 1677V1750H313V-1249H5V-1176H240V1677H5Z"></path><path stroke-width="0" id="E1925-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1925-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1925-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1925-MJMAIN-22EF" d="M78 250Q78 274 95 292T138 310Q162 310 180 294T199 251Q199 226 182 208T139 190T96 207T78 250ZM525 250Q525 274 542 292T585 310Q609 310 627 294T646 251Q646 226 629 208T586 190T543 207T525 250ZM972 250Q972 274 989 292T1032 310Q1056 310 1074 294T1093 251Q1093 226 1076 208T1033 190T990 207T972 250Z"></path><path stroke-width="0" id="E1925-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1925-MJMAIN-221E" d="M55 217Q55 305 111 373T254 442Q342 442 419 381Q457 350 493 303L507 284L514 294Q618 442 747 442Q833 442 888 374T944 214Q944 128 889 59T743 -11Q657 -11 580 50Q542 81 506 128L492 147L485 137Q381 -11 252 -11Q166 -11 111 57T55 217ZM907 217Q907 285 869 341T761 397Q740 397 720 392T682 378T648 359T619 335T594 310T574 285T559 263T548 246L543 238L574 198Q605 158 622 138T664 94T714 61T765 51Q827 51 867 100T907 217ZM92 214Q92 145 131 89T239 33Q357 33 456 193L425 233Q364 312 334 337Q285 380 233 380Q171 380 132 331T92 214Z"></path><path stroke-width="0" id="E1925-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1925-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1925-MJSZ1-5B" d="M202 -349V850H394V810H242V-309H394V-349H202Z"></path><path stroke-width="0" id="E1925-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1925-MJSZ1-5D" d="M22 810V850H214V-349H22V-309H174V810H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,13810)"><use xlink:href="#E1925-MJMAIN-2207" x="0" y="0"></use><g transform="translate(833,0)"><use xlink:href="#E1925-MJMATHI-45" x="0" y="0"></use><g transform="translate(738,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-5B" x="2957" y="0"></use><g transform="translate(3235,0)"><use xlink:href="#E1925-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1111" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-5D" x="4475" y="0"></use></g></g><g transform="translate(4735,0)"><g transform="translate(0,13810)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="1333" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="2166" y="0"></use><use xlink:href="#E1925-MJMAIN-5B" x="2930" y="0"></use><g transform="translate(3208,0)"><use xlink:href="#E1925-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="5080" y="0"></use><g transform="translate(5469,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-29" x="6536" y="0"></use><use xlink:href="#E1925-MJMAIN-5D" x="6925" y="0"></use><use xlink:href="#E1925-MJMAIN-3D" x="7480" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="8536" y="0"></use><use xlink:href="#E1925-MJMAIN-5B" x="9300" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="9578" y="0"></use><g transform="translate(10411,0)"><use xlink:href="#E1925-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="12283" y="0"></use><g transform="translate(12672,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-29" x="13739" y="0"></use><use xlink:href="#E1925-MJMAIN-5D" x="14128" y="0"></use></g><g transform="translate(0,11398)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2193,0)"><use xlink:href="#E1925-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="4026" y="0"></use><g transform="translate(4415,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="5482" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="5926" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="6455" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="6844" y="0"></use><use xlink:href="#E1925-MJMATHI-3C0" x="7677" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="8250" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="8639" y="0"></use><use xlink:href="#E1925-MJMAIN-2223" x="9446" y="0"></use><g transform="translate(10002,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="11068" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="11513" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="11982" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="12371" y="-1"></use></g><use xlink:href="#E1925-MJMAIN-2B" x="15441" y="0"></use><use xlink:href="#E1925-MJMATHI-3B3" x="16441" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="16984" y="0"></use><use xlink:href="#E1925-MJMAIN-5B" x="17748" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="18026" y="0"></use><g transform="translate(18859,0)"><use xlink:href="#E1925-MJMATHI-76" x="0" y="0"></use><g transform="translate(485,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="20731" y="0"></use><g transform="translate(21120,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-31" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-29" x="22186" y="0"></use><use xlink:href="#E1925-MJMAIN-5D" x="22575" y="0"></use></g><g transform="translate(0,9048)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1925-MJMAIN-22EF" x="1333" y="0"></use></g><g transform="translate(0,6799)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><use xlink:href="#E1925-MJMATHI-45" x="2944" y="0"></use><g transform="translate(3874,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2193,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3099,0)"><use xlink:href="#E1925-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="4932" y="0"></use><g transform="translate(5321,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="6289" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="6734" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="7263" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="7652" y="0"></use><use xlink:href="#E1925-MJMATHI-3C0" x="8485" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="9058" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="9447" y="0"></use><use xlink:href="#E1925-MJMAIN-2223" x="10254" y="0"></use><g transform="translate(10809,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="11778" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="12222" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="12691" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="13080" y="-1"></use></g></g><g transform="translate(0,3500)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><use xlink:href="#E1925-MJMATHI-45" x="2944" y="0"></use><g transform="translate(3874,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2193,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3099,0)"><use xlink:href="#E1925-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="4932" y="0"></use><g transform="translate(5321,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="6289" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="6734" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="7263" y="0"></use><use xlink:href="#E1925-MJMATHI-3C0" x="7652" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="8225" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="8614" y="0"></use><use xlink:href="#E1925-MJMAIN-2223" x="9421" y="0"></use><g transform="translate(9976,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="10945" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="11389" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="11858" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="12247" y="0"></use><g transform="translate(13247,0)"><use xlink:href="#E1925-MJMAIN-6C"></use><use xlink:href="#E1925-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="14248" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="14821" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="15210" y="0"></use><use xlink:href="#E1925-MJMAIN-2223" x="16016" y="0"></use><g transform="translate(16572,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="17540" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="17985" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="18454" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="18843" y="-1"></use></g></g><g transform="translate(0,201)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><use xlink:href="#E1925-MJMATHI-45" x="2944" y="0"></use><g transform="translate(3874,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="2193" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="2766" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="3155" y="0"></use><use xlink:href="#E1925-MJMAIN-2223" x="3962" y="0"></use><g transform="translate(4518,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="5486" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="5931" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="6400" y="0"></use><g transform="translate(6789,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(7694,0)"><use xlink:href="#E1925-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="9527" y="0"></use><g transform="translate(9916,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="10885" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="11329" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="11858" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="12247" y="0"></use><g transform="translate(13247,0)"><use xlink:href="#E1925-MJMAIN-6C"></use><use xlink:href="#E1925-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="14248" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="14821" y="0"></use><use xlink:href="#E1925-MJMATHI-61" x="15210" y="0"></use><use xlink:href="#E1925-MJMAIN-2223" x="16016" y="0"></use><g transform="translate(16572,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="17540" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="17985" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="18454" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="18843" y="-1"></use></g></g><g transform="translate(0,-3011)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><use xlink:href="#E1925-MJMATHI-45" x="2944" y="0"></use><use xlink:href="#E1925-MJSZ1-5B" x="3708" y="-1"></use><g transform="translate(4125,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(5031,0)"><use xlink:href="#E1925-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="6863" y="0"></use><g transform="translate(7252,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="8221" y="0"></use><g transform="translate(8665,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-29" x="9771" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="10160" y="0"></use><g transform="translate(11159,0)"><use xlink:href="#E1925-MJMAIN-6C"></use><use xlink:href="#E1925-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="12160" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="12733" y="0"></use><g transform="translate(13122,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2223" x="14505" y="0"></use><g transform="translate(15061,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="16029" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="16474" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="16943" y="0"></use><use xlink:href="#E1925-MJSZ1-5D" x="17332" y="-1"></use></g><g transform="translate(0,-6264)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(2193,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3099,0)"><use xlink:href="#E1925-MJMATHI-71" x="0" y="0"></use><g transform="translate(446,-186)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-29" x="1431" y="0"></use></g></g><use xlink:href="#E1925-MJMAIN-28" x="4932" y="0"></use><g transform="translate(5321,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="6289" y="0"></use><g transform="translate(6734,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-29" x="7839" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="8228" y="0"></use><g transform="translate(9228,0)"><use xlink:href="#E1925-MJMAIN-6C"></use><use xlink:href="#E1925-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="10228" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="10801" y="0"></use><g transform="translate(11190,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2223" x="12573" y="0"></use><g transform="translate(13129,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="14098" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="14542" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="15011" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="15400" y="-1"></use></g></g><g transform="translate(0,-9563)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(2193,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1925-MJMATHI-45" x="3099" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="3863" y="0"></use><g transform="translate(4252,0)"><use xlink:href="#E1925-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2223" x="5671" y="0"></use><g transform="translate(6227,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2C" x="7195" y="0"></use><g transform="translate(7640,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-29" x="8745" y="0"></use><use xlink:href="#E1925-MJMAIN-2207" x="9134" y="0"></use><g transform="translate(10134,0)"><use xlink:href="#E1925-MJMAIN-6C"></use><use xlink:href="#E1925-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="11134" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="11707" y="0"></use><g transform="translate(12096,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2223" x="13479" y="0"></use><g transform="translate(14035,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="15003" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="15448" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="15917" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="16306" y="-1"></use></g></g><g transform="translate(0,-12862)"><use xlink:href="#E1925-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1925-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1925-MJSZ4-5B"></use><g transform="translate(583,0)"><use xlink:href="#E1925-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(2193,0)"><use xlink:href="#E1925-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3099,0)"><use xlink:href="#E1925-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2207" x="4240" y="0"></use><g transform="translate(5240,0)"><use xlink:href="#E1925-MJMAIN-6C"></use><use xlink:href="#E1925-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1925-MJMATHI-3C0" x="6241" y="0"></use><use xlink:href="#E1925-MJMAIN-28" x="6814" y="0"></use><g transform="translate(7203,0)"><use xlink:href="#E1925-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-2223" x="8586" y="0"></use><g transform="translate(9141,0)"><use xlink:href="#E1925-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1925-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1925-MJMAIN-3B" x="10110" y="0"></use><use xlink:href="#E1925-MJMATHI-3B8" x="10554" y="0"></use><use xlink:href="#E1925-MJMAIN-29" x="11023" y="0"></use><use xlink:href="#E1925-MJSZ4-5D" x="11412" y="-1"></use></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-823">\begin{split}
\nabla E_{\pi(\theta)}[G_0] &= \nabla E[v_{\pi(\theta)}(S_0)] = E[\nabla v_{\pi(\theta)}(S_0)] \\
&=E\left[\sum_a q_{\pi(\theta)}(S_0,a)\nabla\pi(a \mid S_0;\theta) \right] + \gamma E[\nabla v_{\pi(\theta)}(S_1)] \\
&= \cdots \\
&=\sum_{t=0}^{+\infty} E \left[\sum_a \gamma^t q_{\pi(\theta)}(S_t,a) \nabla\pi(a \mid S_t;\theta)\right] \\
&=\sum_{t=0}^{+\infty} E \left[\sum_a \gamma^t q_{\pi(\theta)}(S_t,a) \pi(a \mid S_t;\theta) \nabla \ln \pi(a \mid S_t;\theta)\right] \\
&=\sum_{t=0}^{+\infty} E \left[\sum_a \pi(a \mid S_t;\theta) \gamma^t q_{\pi(\theta)}(S_t,a) \nabla \ln \pi(a \mid S_t;\theta)\right] \\
&=\sum_{t=0}^{+\infty} E \big[\gamma^t q_{\pi(\theta)}(S_t,A_t) \nabla \ln \pi(A_t \mid S_t;\theta) \big] \\
&=E \left[\sum_{t=0}^{+\infty} \gamma^t q_{\pi(\theta)}(S_t,A_t) \nabla \ln \pi(A_t \mid S_t;\theta) \right] \\
&=E \left[\sum_{t=0}^{+\infty} \gamma^t E(G_t \mid S_t,A_t) \nabla \ln \pi(A_t \mid S_t;\theta) \right] \\
&=E \left[\sum_{t=0}^{+\infty} \gamma^t G_t \nabla \ln \pi(A_t \mid S_t;\theta) \right] \\
\end{split}</script></div></div><h3><a name="二同策回合更新策略梯度算法" class="md-header-anchor"></a><span>二、同策回合更新策略梯度算法</span></h3><p><span>由式 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.968ex" height="2.71ex" viewBox="0 -832.7 1278 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1955-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1955-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1955-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><a class="mjx-svg-href" xlink:href="#mjx-eqn-eq%3A2"><rect width="1278" height="1000" y="-250" fill="none" stroke="none" pointer-events="all"></rect><g class="MathJax_ref"><use xlink:href="#E1955-MJMAIN-28"></use><use xlink:href="#E1955-MJMAIN-32" x="389" y="0"></use><use xlink:href="#E1955-MJMAIN-29" x="889" y="0"></use></g></a></g></svg></span><script type="math/tex">\eqref{eq:2}</script><span> 可直接得到一个策略梯度算法——简单的策略梯度算法（Vanilla Policy Gradient, VPG），其每一步的更新式为：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n18" cid="n18" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-824-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="98.296ex" height="2.903ex" viewBox="0 -874.2 42321.7 1250" role="img" focusable="false" style="vertical-align: -0.873ex; max-width: 100%;"><defs><path stroke-width="0" id="E1926-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1926-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="0" id="E1926-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1926-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1926-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1926-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1926-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1926-MJMAIN-2190" d="M944 261T944 250T929 230H165Q167 228 182 216T211 189T244 152T277 96T303 25Q308 7 308 0Q308 -11 288 -11Q281 -11 278 -11T272 -7T267 2T263 21Q245 94 195 151T73 236Q58 242 55 247Q55 254 59 257T73 264Q121 283 158 314T215 375T247 434T264 480L267 497Q269 503 270 505T275 509T288 511Q308 511 308 500Q308 493 303 475Q293 438 278 406T246 352T215 315T185 287T165 270H929Q944 261 944 250Z"></path><path stroke-width="0" id="E1926-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E1926-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1926-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1926-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1926-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1926-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1926-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1926-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1926-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1926-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1926-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1926-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1926-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1926-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1926-MJMAIN-22EF" d="M78 250Q78 274 95 292T138 310Q162 310 180 294T199 251Q199 226 182 208T139 190T96 207T78 250ZM525 250Q525 274 542 292T585 310Q609 310 627 294T646 251Q646 226 629 208T586 190T543 207T525 250ZM972 250Q972 274 989 292T1032 310Q1056 310 1074 294T1093 251Q1093 226 1076 208T1033 190T990 207T972 250Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(41043,0)"><g id="mjx-eqn-eq:3" transform="translate(0,-53)"><use xlink:href="#E1926-MJMAIN-28"></use><use xlink:href="#E1926-MJMAIN-33" x="389" y="0"></use><use xlink:href="#E1926-MJMAIN-29" x="889" y="0"></use></g></g><g transform="translate(10003,0)"><g transform="translate(-19,0)"><g transform="translate(0,-53)"><use xlink:href="#E1926-MJMATHI-3B8" x="0" y="0"></use><g transform="translate(469,-150)"><use transform="scale(0.707)" xlink:href="#E1926-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMAIN-31" x="1139" y="0"></use></g><use xlink:href="#E1926-MJMAIN-2190" x="2005" y="0"></use><g transform="translate(3283,0)"><use xlink:href="#E1926-MJMATHI-3B8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMATHI-74" x="663" y="-213"></use></g><use xlink:href="#E1926-MJMAIN-2B" x="4329" y="0"></use><use xlink:href="#E1926-MJMATHI-3B1" x="5330" y="0"></use><g transform="translate(5970,0)"><use xlink:href="#E1926-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(6876,0)"><use xlink:href="#E1926-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1926-MJMAIN-2207" x="8017" y="0"></use><g transform="translate(9016,0)"><use xlink:href="#E1926-MJMAIN-6C"></use><use xlink:href="#E1926-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1926-MJMATHI-3C0" x="10017" y="0"></use><use xlink:href="#E1926-MJMAIN-28" x="10590" y="0"></use><g transform="translate(10979,0)"><use xlink:href="#E1926-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1926-MJMAIN-2223" x="12362" y="0"></use><g transform="translate(12918,0)"><use xlink:href="#E1926-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1926-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1926-MJMAIN-3B" x="13886" y="0"></use><use xlink:href="#E1926-MJMATHI-3B8" x="14331" y="0"></use><use xlink:href="#E1926-MJMAIN-29" x="14800" y="0"></use><use xlink:href="#E1926-MJMAIN-2C" x="15467" y="0"></use><use xlink:href="#E1926-MJMATHI-74" x="17911" y="0"></use><use xlink:href="#E1926-MJMAIN-3D" x="18550" y="0"></use><use xlink:href="#E1926-MJMAIN-30" x="19606" y="0"></use><use xlink:href="#E1926-MJMAIN-2C" x="20106" y="0"></use><use xlink:href="#E1926-MJMAIN-31" x="20551" y="0"></use><use xlink:href="#E1926-MJMAIN-2C" x="21051" y="0"></use><use xlink:href="#E1926-MJMAIN-22EF" x="21495" y="0"></use></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-824">\theta_{t+1} \leftarrow \theta_t + \alpha \gamma^t G_t \nabla \ln \pi(A_t \mid S_t; \theta)\;, \qquad t=0,1,\cdots
\label{eq:3}</script></div></div><p><span>这样迭代完一个回合轨迹就实现了 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="35.658ex" height="6.955ex" viewBox="0 -1746.4 15352.5 2994.3" role="img" focusable="false" style="vertical-align: -2.899ex;"><defs><path stroke-width="0" id="E2007-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2007-MJMAIN-2190" d="M944 261T944 250T929 230H165Q167 228 182 216T211 189T244 152T277 96T303 25Q308 7 308 0Q308 -11 288 -11Q281 -11 278 -11T272 -7T267 2T263 21Q245 94 195 151T73 236Q58 242 55 247Q55 254 59 257T73 264Q121 283 158 314T215 375T247 434T264 480L267 497Q269 503 270 505T275 509T288 511Q308 511 308 500Q308 493 303 475Q293 438 278 406T246 352T215 315T185 287T165 270H929Q944 261 944 250Z"></path><path stroke-width="0" id="E2007-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E2007-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E2007-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E2007-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2007-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E2007-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E2007-MJMAIN-221E" d="M55 217Q55 305 111 373T254 442Q342 442 419 381Q457 350 493 303L507 284L514 294Q618 442 747 442Q833 442 888 374T944 214Q944 128 889 59T743 -11Q657 -11 580 50Q542 81 506 128L492 147L485 137Q381 -11 252 -11Q166 -11 111 57T55 217ZM907 217Q907 285 869 341T761 397Q740 397 720 392T682 378T648 359T619 335T594 310T574 285T559 263T548 246L543 238L574 198Q605 158 622 138T664 94T714 61T765 51Q827 51 867 100T907 217ZM92 214Q92 145 131 89T239 33Q357 33 456 193L425 233Q364 312 334 337Q285 380 233 380Q171 380 132 331T92 214Z"></path><path stroke-width="0" id="E2007-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E2007-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E2007-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E2007-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E2007-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E2007-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2007-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2007-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2007-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2007-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2007-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2007-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2007-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E2007-MJMAIN-2190" x="746" y="0"></use><use xlink:href="#E2007-MJMATHI-3B8" x="2024" y="0"></use><use xlink:href="#E2007-MJMAIN-2B" x="2715" y="0"></use><use xlink:href="#E2007-MJMATHI-3B1" x="3716" y="0"></use><g transform="translate(4522,0)"><use xlink:href="#E2007-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(6133,0)"><use xlink:href="#E2007-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(7039,0)"><use xlink:href="#E2007-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E2007-MJMAIN-2207" x="8180" y="0"></use><g transform="translate(9180,0)"><use xlink:href="#E2007-MJMAIN-6C"></use><use xlink:href="#E2007-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E2007-MJMATHI-3C0" x="10180" y="0"></use><use xlink:href="#E2007-MJMAIN-28" x="10753" y="0"></use><g transform="translate(11142,0)"><use xlink:href="#E2007-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2007-MJMAIN-2223" x="12525" y="0"></use><g transform="translate(13081,0)"><use xlink:href="#E2007-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2007-MJMAIN-3B" x="14049" y="0"></use><use xlink:href="#E2007-MJMATHI-3B8" x="14494" y="0"></use><use xlink:href="#E2007-MJMAIN-29" x="14963" y="0"></use></g></svg></span><script type="math/tex">\displaystyle \theta \leftarrow \theta + \alpha \sum_{t=0}^{+\infty} \gamma^t G_t \nabla \ln \pi(A_t \mid S_t; \theta)</script><span> 。R. Willims 在文章《Simple statistical gradient-following algorithms for connectionist reinforcement learning》中给出该算法，并称为 “REward Increment = Nonnegative Factor </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.807ex" height="1.36ex" viewBox="0 -541.9 778 585.5" role="img" focusable="false" style="vertical-align: 0.021ex; margin-bottom: -0.122ex;"><defs><path stroke-width="0" id="E1958-MJMAIN-D7" d="M630 29Q630 9 609 9Q604 9 587 25T493 118L389 222L284 117Q178 13 175 11Q171 9 168 9Q160 9 154 15T147 29Q147 36 161 51T255 146L359 250L255 354Q174 435 161 449T147 471Q147 480 153 485T168 490Q173 490 175 489Q178 487 284 383L389 278L493 382Q570 459 587 475T609 491Q630 491 630 471Q630 464 620 453T522 355L418 250L522 145Q606 61 618 48T630 29Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1958-MJMAIN-D7" x="0" y="0"></use></g></svg></span><script type="math/tex">\times</script><span> Offset Reinforcement </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.807ex" height="1.36ex" viewBox="0 -541.9 778 585.5" role="img" focusable="false" style="vertical-align: 0.021ex; margin-bottom: -0.122ex;"><defs><path stroke-width="0" id="E1958-MJMAIN-D7" d="M630 29Q630 9 609 9Q604 9 587 25T493 118L389 222L284 117Q178 13 175 11Q171 9 168 9Q160 9 154 15T147 29Q147 36 161 51T255 146L359 250L255 354Q174 435 161 449T147 471Q147 480 153 485T168 490Q173 490 175 489Q178 487 284 383L389 278L493 382Q570 459 587 475T609 491Q630 491 630 471Q630 464 620 453T522 355L418 250L522 145Q606 61 618 48T630 29Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1958-MJMAIN-D7" x="0" y="0"></use></g></svg></span><script type="math/tex">\times</script><span> Characteristic Eligibility”（REINFORCE），表示增量 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="23.724ex" height="2.807ex" viewBox="0 -874.2 10214.4 1208.4" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1959-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E1959-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1959-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1959-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1959-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1959-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1959-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1959-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1959-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1959-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1959-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1959-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1959-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1959-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1959-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1959-MJMATHI-3B1" x="0" y="0"></use><g transform="translate(640,0)"><use xlink:href="#E1959-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1959-MJMATHI-74" x="778" y="513"></use></g><g transform="translate(1545,0)"><use xlink:href="#E1959-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1959-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1959-MJMAIN-2207" x="2687" y="0"></use><g transform="translate(3686,0)"><use xlink:href="#E1959-MJMAIN-6C"></use><use xlink:href="#E1959-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1959-MJMATHI-3C0" x="4687" y="0"></use><use xlink:href="#E1959-MJMAIN-28" x="5260" y="0"></use><g transform="translate(5649,0)"><use xlink:href="#E1959-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1959-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1959-MJMAIN-2223" x="7032" y="0"></use><g transform="translate(7588,0)"><use xlink:href="#E1959-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1959-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1959-MJMAIN-3B" x="8556" y="0"></use><g transform="translate(9001,0)"><use xlink:href="#E1959-MJMATHI-3B8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1959-MJMATHI-74" x="663" y="-213"></use></g><use xlink:href="#E1959-MJMAIN-29" x="9825" y="0"></use></g></svg></span><script type="math/tex">\alpha \gamma^t G_t \nabla \ln \pi(A_t \mid S_t; \theta_t)</script><span> 是由三个部分的积组成的。当采用自动微分的软件包来学习参数时，可定义单步损失为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="21.285ex" height="2.807ex" viewBox="0 -874.2 9164.2 1208.4" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1960-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1960-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1960-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1960-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1960-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1960-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1960-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1960-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1960-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1960-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1960-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1960-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1960-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1960-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1960-MJMAIN-2212" x="0" y="0"></use><g transform="translate(778,0)"><use xlink:href="#E1960-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1960-MJMATHI-74" x="778" y="513"></use></g><g transform="translate(1683,0)"><use xlink:href="#E1960-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1960-MJMATHI-74" x="1111" y="-213"></use></g><g transform="translate(2991,0)"><use xlink:href="#E1960-MJMAIN-6C"></use><use xlink:href="#E1960-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1960-MJMATHI-3C0" x="3992" y="0"></use><use xlink:href="#E1960-MJMAIN-28" x="4565" y="0"></use><g transform="translate(4954,0)"><use xlink:href="#E1960-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1960-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1960-MJMAIN-2223" x="6337" y="0"></use><g transform="translate(6893,0)"><use xlink:href="#E1960-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1960-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1960-MJMAIN-3B" x="7861" y="0"></use><use xlink:href="#E1960-MJMATHI-3B8" x="8306" y="0"></use><use xlink:href="#E1960-MJMAIN-29" x="8775" y="0"></use></g></svg></span><script type="math/tex">-\gamma^t G_t \ln \pi(A_t \mid S_t; \theta)</script><span> ，然后让软件包自动处理，具体算法如下：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n20" cid="n20" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-825-Frame" tabindex="-1" style="font-size: 100%; display: inline-block; zoom: 0.969453;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="101.491ex" height="47.951ex" viewBox="-18.1 -43.5 43697.4 20645.7" role="img" focusable="false" style="vertical-align: -47.85ex; margin-left: -0.042ex; max-width: 100%;"><defs><path stroke-width="0" id="E1927-MJMAINB-37" d="M256 -11Q231 -11 208 5T185 65Q185 105 193 146T212 220T241 289T275 349T312 402T346 445T377 479T397 502L400 504H301Q156 503 150 497Q142 491 134 456T126 407H64V411Q65 414 82 544T99 675T130 676H161V673Q161 669 162 666T167 661T173 657T181 654T190 652T200 651T210 650T220 649T229 648Q237 648 254 647T276 646Q277 646 426 644H558V620V607Q558 596 551 586T509 537Q489 515 476 500Q390 401 384 393Q349 339 337 259T324 113T322 38Q307 -11 256 -11Z"></path><path stroke-width="0" id="E1927-MJMAINB-2D" d="M13 166V278H318V166H13Z"></path><path stroke-width="0" id="E1927-MJMAINB-31" d="M481 0L294 3Q136 3 109 0H96V62H227V304Q227 546 225 546Q169 529 97 529H80V591H97Q231 591 308 647L319 655H333Q355 655 359 644Q361 640 361 351V62H494V0H481Z"></path><path stroke-width="0" id="E1927-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1927-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1927-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1927-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1927-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E1927-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1927-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1927-MJMAIN-2E" d="M78 60Q78 84 95 102T138 120Q162 120 180 104T199 61Q199 36 182 18T139 0T96 17T78 60Z"></path><path stroke-width="0" id="E1927-MJMAIN-2190" d="M944 261T944 250T929 230H165Q167 228 182 216T211 189T244 152T277 96T303 25Q308 7 308 0Q308 -11 288 -11Q281 -11 278 -11T272 -7T267 2T263 21Q245 94 195 151T73 236Q58 242 55 247Q55 254 59 257T73 264Q121 283 158 314T215 375T247 434T264 480L267 497Q269 503 270 505T275 509T288 511Q308 511 308 500Q308 493 303 475Q293 438 278 406T246 352T215 315T185 287T165 270H929Q944 261 944 250Z"></path><path stroke-width="0" id="E1927-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1927-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1927-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1927-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1927-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1927-MJMATHI-52" d="M230 637Q203 637 198 638T193 649Q193 676 204 682Q206 683 378 683Q550 682 564 680Q620 672 658 652T712 606T733 563T739 529Q739 484 710 445T643 385T576 351T538 338L545 333Q612 295 612 223Q612 212 607 162T602 80V71Q602 53 603 43T614 25T640 16Q668 16 686 38T712 85Q717 99 720 102T735 105Q755 105 755 93Q755 75 731 36Q693 -21 641 -21H632Q571 -21 531 4T487 82Q487 109 502 166T517 239Q517 290 474 313Q459 320 449 321T378 323H309L277 193Q244 61 244 59Q244 55 245 54T252 50T269 48T302 46H333Q339 38 339 37T336 19Q332 6 326 0H311Q275 2 180 2Q146 2 117 2T71 2T50 1Q33 1 33 10Q33 12 36 24Q41 43 46 45Q50 46 61 46H67Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628Q287 635 230 637ZM630 554Q630 586 609 608T523 636Q521 636 500 636T462 637H440Q393 637 386 627Q385 624 352 494T319 361Q319 360 388 360Q466 361 492 367Q556 377 592 426Q608 449 619 486T630 554Z"></path><path stroke-width="0" id="E1927-MJMAIN-22EF" d="M78 250Q78 274 95 292T138 310Q162 310 180 294T199 251Q199 226 182 208T139 190T96 207T78 250ZM525 250Q525 274 542 292T585 310Q609 310 627 294T646 251Q646 226 629 208T586 190T543 207T525 250ZM972 250Q972 274 989 292T1032 310Q1056 310 1074 294T1093 251Q1093 226 1076 208T1033 190T990 207T972 250Z"></path><path stroke-width="0" id="E1927-MJMATHI-54" d="M40 437Q21 437 21 445Q21 450 37 501T71 602L88 651Q93 669 101 677H569H659Q691 677 697 676T704 667Q704 661 687 553T668 444Q668 437 649 437Q640 437 637 437T631 442L629 445Q629 451 635 490T641 551Q641 586 628 604T573 629Q568 630 515 631Q469 631 457 630T439 622Q438 621 368 343T298 60Q298 48 386 46Q418 46 427 45T436 36Q436 31 433 22Q429 4 424 1L422 0Q419 0 415 0Q410 0 363 1T228 2Q99 2 64 0H49Q43 6 43 9T45 27Q49 40 55 46H83H94Q174 46 189 55Q190 56 191 56Q196 59 201 76T241 233Q258 301 269 344Q339 619 339 625Q339 630 310 630H279Q212 630 191 624Q146 614 121 583T67 467Q60 445 57 441T43 437H40Z"></path><path stroke-width="0" id="E1927-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1927-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1927-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="0" id="E1927-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1927-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1927-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1927-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1927-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1927-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1927-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(10828,-2462)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">算</text><g transform="translate(1052,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">法</text></g><use transform="scale(1.2)" xlink:href="#E1927-MJMAINB-37" x="1963" y="0"></use><use transform="scale(1.2)" xlink:href="#E1927-MJMAINB-2D" x="2538" y="0"></use><use transform="scale(1.2)" xlink:href="#E1927-MJMAINB-31" x="2921" y="0"></use><g transform="translate(4945,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">简</text></g><g transform="translate(5961,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">单</text></g><g transform="translate(7014,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">的</text></g><g transform="translate(8030,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(9083,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(10136,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">梯</text></g><g transform="translate(11189,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">度</text></g><g transform="translate(12242,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">算</text></g><g transform="translate(13294,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">法</text></g><g transform="translate(14347,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">求</text></g><g transform="translate(15400,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">解</text></g><g transform="translate(16453,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">最</text></g><g transform="translate(17506,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">优</text></g><g transform="translate(18559,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(19611,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">略</text></g></g><g transform="translate(0,-11228)"><g transform="translate(-19,0)"><g transform="translate(0,7368)"><g><rect fill="black" stroke="none" width="1569" height="100" x="0" y="500"></rect></g></g><g transform="translate(0,-7169)"><g><rect fill="black" stroke="none" width="1569" height="100" x="0" y="-500"></rect></g></g></g><g transform="translate(1551,0)"><g transform="translate(0,7368)"><g><rect fill="black" stroke="none" width="41597" height="100" x="0" y="500"></rect></g></g><g transform="translate(0,6068)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">输</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">入</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">：</text></g><g transform="translate(2491,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">环</text></g><g transform="translate(3322,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">境</text></g><g transform="translate(4153,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(4983,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">无</text></g><g transform="translate(5814,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(6645,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">学</text></g><g transform="translate(7475,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">描</text></g><g transform="translate(8306,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">述</text></g><g transform="translate(9137,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(9967,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g><g transform="translate(0,4768)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">输</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">出</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">：</text></g><g transform="translate(2491,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">最</text></g><g transform="translate(3322,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">优</text></g><g transform="translate(4153,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(4983,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(5814,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">的</text></g><g transform="translate(6645,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">估</text></g><g transform="translate(7475,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">计</text></g><g transform="translate(8556,0)"><use xlink:href="#E1927-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1927-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1927-MJMAIN-29" x="1431" y="0"></use></g><g transform="translate(10376,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g><g transform="translate(0,3418)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">参</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">：</text></g><g transform="translate(2491,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">优</text></g><g transform="translate(3322,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">化</text></g><g transform="translate(4153,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">器</text></g><g transform="translate(4983,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(5814,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">隐</text></g><g transform="translate(6645,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">含</text></g><g transform="translate(7475,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">学</text></g><g transform="translate(8306,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">习</text></g><g transform="translate(9137,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">率</text></g><use xlink:href="#E1927-MJMATHI-3B1" x="10217" y="0"></use><g transform="translate(10857,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">折</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">扣</text></g><g transform="translate(3572,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">因</text></g><g transform="translate(4403,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">子</text></g></g><use xlink:href="#E1927-MJMATHI-3B3" x="16341" y="0"></use><g transform="translate(16884,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">控</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">制</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(3572,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">合</text></g><g transform="translate(4403,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(5233,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">和</text></g><g transform="translate(6064,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(6895,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">合</text></g><g transform="translate(7725,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">内</text></g><g transform="translate(8556,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">步</text></g><g transform="translate(9387,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(10217,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">的</text></g><g transform="translate(11048,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">参</text></g><g transform="translate(11879,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(12709,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g><g transform="translate(0,2102)"><use xlink:href="#E1927-MJMAIN-31"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><g transform="translate(778,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(1608,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">初</text></g><g transform="translate(2439,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">始</text></g><g transform="translate(3269,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">化</text></g><g transform="translate(4100,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(4931,0)"><use xlink:href="#E1927-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-2190" x="746" y="0"></use></g><g transform="translate(6678,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">任</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">意</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">值</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g><g transform="translate(0,802)"><use xlink:href="#E1927-MJMAIN-32"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><g transform="translate(778,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(1608,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(2439,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">合</text></g><g transform="translate(3269,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(4100,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><g transform="translate(4931,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(5761,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">对</text></g><g transform="translate(6592,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">于</text></g><g transform="translate(7423,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">每</text></g><g transform="translate(8253,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">个</text></g><g transform="translate(9084,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(9915,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">合</text></g><g transform="translate(10745,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">执</text></g><g transform="translate(11576,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">行</text></g><g transform="translate(12407,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">以</text></g><g transform="translate(13237,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">下</text></g><g transform="translate(14068,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">操</text></g><g transform="translate(14899,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">作</text></g><g transform="translate(15729,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">：</text></g></g><g transform="translate(0,-498)"><g transform="translate(2000,0)"><use xlink:href="#E1927-MJMAIN-32"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1927-MJMAIN-31" x="778" y="0"></use><g transform="translate(1278,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2108,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">采</text></g><g transform="translate(2939,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">样</text></g><g transform="translate(3769,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(4600,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">用</text></g><g transform="translate(5431,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(6261,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(7342,0)"><use xlink:href="#E1927-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1927-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1927-MJMAIN-29" x="1431" y="0"></use></g><g transform="translate(9162,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">生</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">成</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">轨</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">迹</text></g></g><g transform="translate(12985,0)"><use xlink:href="#E1927-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-30" x="866" y="-213"></use><use xlink:href="#E1927-MJMAIN-2C" x="1066" y="0"></use><g transform="translate(1511,0)"><use xlink:href="#E1927-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-30" x="1060" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-2C" x="2714" y="0"></use><g transform="translate(3159,0)"><use xlink:href="#E1927-MJMATHI-52" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-31" x="1073" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-2C" x="4371" y="0"></use><g transform="translate(4816,0)"><use xlink:href="#E1927-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-31" x="866" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-2C" x="5883" y="0"></use><use xlink:href="#E1927-MJMAIN-22EF" x="6327" y="0"></use><use xlink:href="#E1927-MJMAIN-2C" x="7666" y="0"></use><g transform="translate(8111,0)"><use xlink:href="#E1927-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-54" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-2212" x="704" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-31" x="1482" y="0"></use></g></g><use xlink:href="#E1927-MJMAIN-2C" x="10225" y="0"></use><g transform="translate(10670,0)"><use xlink:href="#E1927-MJMATHI-41" x="0" y="0"></use><g transform="translate(750,-150)"><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-54" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-2212" x="704" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-31" x="1482" y="0"></use></g></g><use xlink:href="#E1927-MJMAIN-2C" x="12921" y="0"></use><g transform="translate(13366,0)"><use xlink:href="#E1927-MJMATHI-52" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-54" x="1073" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-2C" x="14723" y="0"></use><g transform="translate(15167,0)"><use xlink:href="#E1927-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-54" x="866" y="-213"></use></g></g><g transform="translate(29363,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g></g><g transform="translate(0,-1848)"><g transform="translate(2000,0)"><use xlink:href="#E1927-MJMAIN-32"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1927-MJMAIN-32" x="778" y="0"></use><g transform="translate(1278,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2108,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">初</text></g><g transform="translate(2939,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">始</text></g><g transform="translate(3769,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">化</text></g><g transform="translate(4600,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(5431,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">报</text></g><g transform="translate(6261,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(7092,0)"><use xlink:href="#E1927-MJMATHI-47" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-2190" x="1063" y="0"></use><use xlink:href="#E1927-MJMAIN-30" x="2341" y="0"></use></g><g transform="translate(9934,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g></g><g transform="translate(0,-3148)"><g transform="translate(2000,0)"><use xlink:href="#E1927-MJMAIN-32"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1927-MJMAIN-33" x="778" y="0"></use><g transform="translate(1278,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2108,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">逐</text></g><g transform="translate(2939,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">步</text></g><g transform="translate(3769,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(4600,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><g transform="translate(5431,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(6261,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">对</text></g><g transform="translate(7342,0)"><use xlink:href="#E1927-MJMATHI-74" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-2190" x="638" y="0"></use><use xlink:href="#E1927-MJMATHI-54" x="1916" y="0"></use><use xlink:href="#E1927-MJMAIN-2212" x="2842" y="0"></use><use xlink:href="#E1927-MJMAIN-31" x="3843" y="0"></use><use xlink:href="#E1927-MJMAIN-2C" x="4343" y="0"></use><use xlink:href="#E1927-MJMATHI-54" x="4787" y="0"></use><use xlink:href="#E1927-MJMAIN-2212" x="5713" y="0"></use><use xlink:href="#E1927-MJMAIN-32" x="6714" y="0"></use><use xlink:href="#E1927-MJMAIN-2C" x="7214" y="0"></use><use xlink:href="#E1927-MJMAIN-22EF" x="7658" y="0"></use><use xlink:href="#E1927-MJMAIN-2C" x="8997" y="0"></use><use xlink:href="#E1927-MJMAIN-30" x="9442" y="0"></use></g><g transform="translate(17284,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">执</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">行</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">以</text></g><g transform="translate(3572,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">下</text></g><g transform="translate(4403,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">步</text></g><g transform="translate(5233,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">骤</text></g><g transform="translate(6064,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">：</text></g></g></g></g><g transform="translate(0,-4448)"><g transform="translate(4000,0)"><use xlink:href="#E1927-MJMAIN-32"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1927-MJMAIN-33" x="778" y="0"></use><use xlink:href="#E1927-MJMAIN-2E" x="1278" y="0"></use><use xlink:href="#E1927-MJMAIN-31" x="1556" y="0"></use><g transform="translate(2056,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2886,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(3717,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><g transform="translate(4547,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(5378,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">报</text></g><g transform="translate(6209,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(7039,0)"><use xlink:href="#E1927-MJMATHI-47" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-2190" x="1063" y="0"></use><use xlink:href="#E1927-MJMATHI-3B3" x="2341" y="0"></use><use xlink:href="#E1927-MJMATHI-47" x="2884" y="0"></use><use xlink:href="#E1927-MJMAIN-2B" x="3892" y="0"></use><g transform="translate(4893,0)"><use xlink:href="#E1927-MJMATHI-52" x="0" y="0"></use><g transform="translate(759,-150)"><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMAIN-31" x="1139" y="0"></use></g></g></g><g transform="translate(13950,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">；</text></g></g></g></g><g transform="translate(0,-5819)"><g transform="translate(4000,0)"><use xlink:href="#E1927-MJMAIN-32"></use><use xlink:href="#E1927-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1927-MJMAIN-33" x="778" y="0"></use><use xlink:href="#E1927-MJMAIN-2E" x="1278" y="0"></use><use xlink:href="#E1927-MJMAIN-32" x="1556" y="0"></use><g transform="translate(2056,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2886,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(3717,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><g transform="translate(4547,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(5378,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(6209,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(7039,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(7870,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><use xlink:href="#E1927-MJMATHI-3B8" x="8951" y="0"></use><g transform="translate(9420,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">以</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">减</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">小</text></g></g><g transform="translate(12412,0)"><use xlink:href="#E1927-MJMAIN-2212" x="0" y="0"></use><g transform="translate(778,0)"><use xlink:href="#E1927-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1927-MJMATHI-47" x="1683" y="0"></use><g transform="translate(2636,0)"><use xlink:href="#E1927-MJMAIN-6C"></use><use xlink:href="#E1927-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1927-MJMATHI-3C0" x="3637" y="0"></use><use xlink:href="#E1927-MJMAIN-28" x="4210" y="0"></use><g transform="translate(4599,0)"><use xlink:href="#E1927-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-2223" x="5982" y="0"></use><g transform="translate(6537,0)"><use xlink:href="#E1927-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-3B" x="7506" y="0"></use><use xlink:href="#E1927-MJMATHI-3B8" x="7950" y="0"></use><use xlink:href="#E1927-MJMAIN-29" x="8419" y="0"></use></g><g transform="translate(21221,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">如</text></g></g><g transform="translate(23382,0)"><use xlink:href="#E1927-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E1927-MJMAIN-2190" x="746" y="0"></use><use xlink:href="#E1927-MJMATHI-3B8" x="2024" y="0"></use><use xlink:href="#E1927-MJMAIN-2B" x="2715" y="0"></use><use xlink:href="#E1927-MJMATHI-3B1" x="3716" y="0"></use><g transform="translate(4356,0)"><use xlink:href="#E1927-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1927-MJMATHI-47" x="5261" y="0"></use><use xlink:href="#E1927-MJMAIN-2207" x="6047" y="0"></use><g transform="translate(7047,0)"><use xlink:href="#E1927-MJMAIN-6C"></use><use xlink:href="#E1927-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1927-MJMATHI-3C0" x="8048" y="0"></use><use xlink:href="#E1927-MJMAIN-28" x="8621" y="0"></use><g transform="translate(9010,0)"><use xlink:href="#E1927-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-2223" x="10393" y="0"></use><g transform="translate(10948,0)"><use xlink:href="#E1927-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1927-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1927-MJMAIN-3B" x="11917" y="0"></use><use xlink:href="#E1927-MJMATHI-3B8" x="12361" y="0"></use><use xlink:href="#E1927-MJMAIN-29" x="12830" y="0"></use></g><g transform="translate(36602,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g></g><g transform="translate(0,-7169)"><g><rect fill="black" stroke="none" width="41597" height="100" x="0" y="-500"></rect></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-825">\; \\ \; \\
\large \textbf{算法 7-1   简单的策略梯度算法求解最优策略} \\
\begin{split}
\rule[5pt]{10mm}{0.1em} &\rule[5pt]{265mm}{0.1em} \\
&\text{输入：环境（无数学描述）。} \\
&\text{输出：最优策略的估计 $\pi(\theta)$ 。} \\
&\text{参数：优化器（隐含学习率 $\alpha$ ），折扣因子 $\gamma$ ，控制回合数和回合内步数的参数。} \\
&\text{1.（初始化）$\theta \leftarrow$ 任意值。} \\
&\text{2.（回合更新）对于每个回合执行以下操作：} \\
&\qquad \text{2.1（采样）用策略 $\pi(\theta)$ 生成轨迹 $S_0,A_0,R_1,S_1,\cdots,S_{T-1},A_{T-1},R_T,S_T$ 。} \\
&\qquad \text{2.2（初始化回报）$G \leftarrow 0$ 。} \\
&\qquad \text{2.3（逐步更新）对 $t \leftarrow T-1,T-2,\cdots,0$ ，执行以下步骤：} \\
&\qquad \qquad \text{2.3.1（更新回报）$G \leftarrow \gamma G + R_{t+1}$ ；} \\
&\qquad \qquad \text{2.3.2（更新策略）更新 $\theta$ 以减小 $-\gamma^t G \ln \pi(A_t \mid S_t; \theta)$ ，如 $\theta \leftarrow \theta + \alpha\gamma^t G \nabla \ln \pi(A_t \mid S_t; \theta)$ 。} \\
\rule[-5pt]{10mm}{0.1em} &\rule[-5pt]{265mm}{0.1em}
\end{split}
\; \\ \; \\</script></div></div><p><span>回合更新的方法没有用到自益，不会引入偏差，但往往有非常大的方差。为了降低方差，引入基线函数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="11.757ex" height="2.71ex" viewBox="0 -832.7 5062 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1961-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1961-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1961-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1961-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1961-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1961-MJMAIN-2208" d="M84 250Q84 372 166 450T360 539Q361 539 377 539T419 540T469 540H568Q583 532 583 520Q583 511 570 501L466 500Q355 499 329 494Q280 482 242 458T183 409T147 354T129 306T124 272V270H568Q583 262 583 250T568 230H124V228Q124 207 134 177T167 112T231 48T328 7Q355 1 466 0H570Q583 -10 583 -20Q583 -32 568 -40H471Q464 -40 446 -40T417 -41Q262 -41 172 45Q84 127 84 250Z"></path><path stroke-width="0" id="E1961-MJCAL-53" d="M554 512Q536 512 536 522Q536 525 539 539T542 564Q542 588 528 604Q515 616 482 625T410 635Q374 635 349 624T312 594T295 561T290 532Q290 505 303 482T342 442T378 419T409 404Q435 391 451 383T494 357T535 323T562 282T574 231Q574 133 464 56T220 -22Q138 -22 78 21T18 123Q18 184 61 227T156 274Q178 274 178 263Q178 260 177 258Q172 247 164 239T151 227T136 218L127 213L124 202Q118 186 118 163Q120 124 165 86T292 48Q374 48 423 86T473 186V193Q473 267 347 327Q268 364 239 389Q191 431 191 486Q191 547 242 600T356 679T470 705Q472 705 478 705T489 704Q551 704 596 682T642 610Q642 566 621 545Q592 516 554 512Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1961-MJMATHI-42" x="0" y="0"></use><use xlink:href="#E1961-MJMAIN-28" x="759" y="0"></use><use xlink:href="#E1961-MJMATHI-73" x="1148" y="0"></use><use xlink:href="#E1961-MJMAIN-29" x="1617" y="0"></use><use xlink:href="#E1961-MJMAIN-2C" x="2006" y="0"></use><use xlink:href="#E1961-MJMATHI-73" x="2728" y="0"></use><use xlink:href="#E1961-MJMAIN-2208" x="3475" y="0"></use><use xlink:href="#E1961-MJCAL-53" x="4420" y="0"></use></g></svg></span><script type="math/tex">B(s),\;s \in \mathcal S</script><span> 对简单的策略梯度算法进行改进——带基线的简单的策略梯度算法（REINFORCE with baselines），基线函数可以是任意随机函数或确定函数，他可以与状态 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.36ex" viewBox="0 -500.4 469 585.5" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E1962-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1962-MJMATHI-73" x="0" y="0"></use></g></svg></span><script type="math/tex">s</script><span> 有关，但不能和动作 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.229ex" height="1.36ex" viewBox="0 -500.4 529 585.5" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E1963-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1963-MJMATHI-61" x="0" y="0"></use></g></svg></span><script type="math/tex">a</script><span> 有关，满足这些条件后，基线函数自然满足：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n22" cid="n22" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-826-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="98.296ex" height="3.096ex" viewBox="0 -915.7 42321.7 1333" role="img" focusable="false" style="vertical-align: -0.969ex; max-width: 100%;"><defs><path stroke-width="0" id="E1928-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1928-MJMAIN-34" d="M462 0Q444 3 333 3Q217 3 199 0H190V46H221Q241 46 248 46T265 48T279 53T286 61Q287 63 287 115V165H28V211L179 442Q332 674 334 675Q336 677 355 677H373L379 671V211H471V165H379V114Q379 73 379 66T385 54Q393 47 442 46H471V0H462ZM293 211V545L74 212L183 211H293Z"></path><path stroke-width="0" id="E1928-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1928-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1928-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1928-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1928-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1928-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1928-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1928-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1928-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1928-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1928-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1928-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1928-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1928-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1928-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1928-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1928-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1928-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1928-MJSZ1-5B" d="M202 -349V850H394V810H242V-309H394V-349H202Z"></path><path stroke-width="0" id="E1928-MJSZ1-5D" d="M22 810V850H214V-349H22V-309H174V810H22Z"></path><path stroke-width="0" id="E1928-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(41043,0)"><g id="mjx-eqn-4" transform="translate(0,-4)"><use xlink:href="#E1928-MJMAIN-28"></use><use xlink:href="#E1928-MJMAIN-34" x="389" y="0"></use><use xlink:href="#E1928-MJMAIN-29" x="889" y="0"></use></g></g><g transform="translate(7433,0)"><g transform="translate(-19,0)"><g transform="translate(0,-4)"><use xlink:href="#E1928-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1928-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1928-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1928-MJMAIN-28" x="1322" y="0"></use><g transform="translate(1711,0)"><use xlink:href="#E1928-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-2212" x="3075" y="0"></use><use xlink:href="#E1928-MJMATHI-42" x="4075" y="0"></use><use xlink:href="#E1928-MJMAIN-28" x="4834" y="0"></use><g transform="translate(5223,0)"><use xlink:href="#E1928-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-29" x="6191" y="0"></use><use xlink:href="#E1928-MJMAIN-29" x="6580" y="0"></use><use xlink:href="#E1928-MJMAIN-2207" x="6969" y="0"></use><g transform="translate(7969,0)"><use xlink:href="#E1928-MJMAIN-6C"></use><use xlink:href="#E1928-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1928-MJMATHI-3C0" x="8970" y="0"></use><use xlink:href="#E1928-MJMAIN-28" x="9543" y="0"></use><g transform="translate(9932,0)"><use xlink:href="#E1928-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-2223" x="11315" y="0"></use><g transform="translate(11870,0)"><use xlink:href="#E1928-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-3B" x="12839" y="0"></use><use xlink:href="#E1928-MJMATHI-3B8" x="13283" y="0"></use><use xlink:href="#E1928-MJMAIN-29" x="13752" y="0"></use><use xlink:href="#E1928-MJSZ1-5D" x="14141" y="-1"></use></g><use xlink:href="#E1928-MJMAIN-3D" x="15767" y="0"></use><use xlink:href="#E1928-MJMATHI-45" x="16823" y="0"></use><g transform="translate(17753,0)"><use xlink:href="#E1928-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1928-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(1322,0)"><use xlink:href="#E1928-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-2207" x="2464" y="0"></use><g transform="translate(3463,0)"><use xlink:href="#E1928-MJMAIN-6C"></use><use xlink:href="#E1928-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1928-MJMATHI-3C0" x="4464" y="0"></use><use xlink:href="#E1928-MJMAIN-28" x="5037" y="0"></use><g transform="translate(5426,0)"><use xlink:href="#E1928-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-2223" x="6809" y="0"></use><g transform="translate(7365,0)"><use xlink:href="#E1928-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1928-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1928-MJMAIN-3B" x="8333" y="0"></use><use xlink:href="#E1928-MJMATHI-3B8" x="8778" y="0"></use><use xlink:href="#E1928-MJMAIN-29" x="9247" y="0"></use><use xlink:href="#E1928-MJSZ1-5D" x="9636" y="-1"></use></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-826">E\left[\gamma^t(G_t - B(S_t)) \nabla\ln\pi(A_t \mid S_t; \theta)\right] = E\left[\gamma^t G_t \nabla\ln\pi(A_t \mid S_t; \theta)\right]</script></div></div><p><span>证明：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n24" cid="n24" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-827-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="52.433ex" height="22.967ex" viewBox="0 -5193.6 22575.1 9888.7" role="img" focusable="false" style="vertical-align: -10.721ex; margin-bottom: -0.184ex; max-width: 100%;"><defs><path stroke-width="0" id="E1929-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1929-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1929-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1929-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1929-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1929-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1929-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1929-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1929-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1929-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1929-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1929-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1929-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1929-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1929-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1929-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1929-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1929-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1929-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1929-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1929-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1929-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1929-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,4260)"><use xlink:href="#E1929-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1929-MJMAIN-5B" x="764" y="0"></use><g transform="translate(1042,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1929-MJMAIN-28" x="1947" y="0"></use><g transform="translate(2336,0)"><use xlink:href="#E1929-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2212" x="3700" y="0"></use><use xlink:href="#E1929-MJMATHI-42" x="4700" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="5459" y="0"></use><g transform="translate(5848,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-29" x="6816" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="7205" y="0"></use><use xlink:href="#E1929-MJMAIN-2207" x="7594" y="0"></use><g transform="translate(8594,0)"><use xlink:href="#E1929-MJMAIN-6C"></use><use xlink:href="#E1929-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1929-MJMATHI-3C0" x="9595" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="10168" y="0"></use><g transform="translate(10557,0)"><use xlink:href="#E1929-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2223" x="11940" y="0"></use><g transform="translate(12495,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-3B" x="13464" y="0"></use><use xlink:href="#E1929-MJMATHI-3B8" x="13908" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="14377" y="0"></use><use xlink:href="#E1929-MJMAIN-5D" x="14766" y="0"></use></g><g transform="translate(0,2760)"><use xlink:href="#E1929-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1929-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1929-MJMAIN-28" x="3850" y="0"></use><g transform="translate(4239,0)"><use xlink:href="#E1929-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2212" x="5602" y="0"></use><use xlink:href="#E1929-MJMATHI-42" x="6602" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="7361" y="0"></use><g transform="translate(7750,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-29" x="8719" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="9108" y="0"></use><use xlink:href="#E1929-MJMAIN-2207" x="9497" y="0"></use><use xlink:href="#E1929-MJMATHI-3C0" x="10330" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="10903" y="0"></use><use xlink:href="#E1929-MJMATHI-61" x="11292" y="0"></use><use xlink:href="#E1929-MJMAIN-2223" x="12098" y="0"></use><g transform="translate(12654,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-3B" x="13622" y="0"></use><use xlink:href="#E1929-MJMATHI-3B8" x="14067" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="14536" y="0"></use></g><g transform="translate(0,353)"><use xlink:href="#E1929-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1929-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3850,0)"><use xlink:href="#E1929-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2207" x="4991" y="0"></use><use xlink:href="#E1929-MJMATHI-3C0" x="5824" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="6397" y="0"></use><use xlink:href="#E1929-MJMATHI-61" x="6786" y="0"></use><use xlink:href="#E1929-MJMAIN-2223" x="7593" y="0"></use><g transform="translate(8148,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-3B" x="9117" y="0"></use><use xlink:href="#E1929-MJMATHI-3B8" x="9561" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="10030" y="0"></use><use xlink:href="#E1929-MJMAIN-2212" x="10642" y="0"></use><g transform="translate(11642,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1929-MJMATHI-42" x="12548" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="13307" y="0"></use><g transform="translate(13696,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-29" x="14664" y="0"></use><use xlink:href="#E1929-MJMAIN-2207" x="15053" y="0"></use><g transform="translate(16052,0)"><use xlink:href="#E1929-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1929-MJMATHI-3C0" x="17663" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="18236" y="0"></use><use xlink:href="#E1929-MJMATHI-61" x="18625" y="0"></use><use xlink:href="#E1929-MJMAIN-2223" x="19432" y="0"></use><g transform="translate(19988,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-3B" x="20956" y="0"></use><use xlink:href="#E1929-MJMATHI-3B8" x="21401" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="21870" y="0"></use></g><g transform="translate(0,-2054)"><use xlink:href="#E1929-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1929-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-61" x="756" y="-1485"></use></g><g transform="translate(2944,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3850,0)"><use xlink:href="#E1929-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2207" x="4991" y="0"></use><use xlink:href="#E1929-MJMATHI-3C0" x="5824" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="6397" y="0"></use><use xlink:href="#E1929-MJMATHI-61" x="6786" y="0"></use><use xlink:href="#E1929-MJMAIN-2223" x="7593" y="0"></use><g transform="translate(8148,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-3B" x="9117" y="0"></use><use xlink:href="#E1929-MJMATHI-3B8" x="9561" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="10030" y="0"></use><use xlink:href="#E1929-MJMAIN-2212" x="10642" y="0"></use><g transform="translate(11642,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1929-MJMATHI-42" x="12548" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="13307" y="0"></use><g transform="translate(13696,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-29" x="14664" y="0"></use><use xlink:href="#E1929-MJMAIN-2207" x="15053" y="0"></use><use xlink:href="#E1929-MJMAIN-31" x="15886" y="0"></use></g><g transform="translate(0,-4367)"><use xlink:href="#E1929-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1929-MJMATHI-45" x="1333" y="0"></use><use xlink:href="#E1929-MJMAIN-5B" x="2097" y="0"></use><g transform="translate(2375,0)"><use xlink:href="#E1929-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3281,0)"><use xlink:href="#E1929-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2207" x="4422" y="0"></use><g transform="translate(5422,0)"><use xlink:href="#E1929-MJMAIN-6C"></use><use xlink:href="#E1929-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1929-MJMATHI-3C0" x="6422" y="0"></use><use xlink:href="#E1929-MJMAIN-28" x="6995" y="0"></use><g transform="translate(7384,0)"><use xlink:href="#E1929-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-2223" x="8768" y="0"></use><g transform="translate(9323,0)"><use xlink:href="#E1929-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1929-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1929-MJMAIN-3B" x="10292" y="0"></use><use xlink:href="#E1929-MJMATHI-3B8" x="10736" y="0"></use><use xlink:href="#E1929-MJMAIN-29" x="11205" y="0"></use><use xlink:href="#E1929-MJMAIN-5D" x="11594" y="0"></use></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-827">\begin{split}
&E[\gamma^t (G_t - B(S_t)) \nabla \ln \pi(A_t \mid S_t; \theta)] \\
&= \sum_a \gamma^t (G_t - B(S_t)) \nabla \pi(a \mid S_t; \theta) \\
&= \sum_a \gamma^t G_t \nabla \pi(a \mid S_t; \theta) - \gamma^t B(S_t) \nabla \sum_a \pi(a \mid S_t; \theta) \\
&= \sum_a \gamma^t G_t \nabla \pi(a \mid S_t; \theta) - \gamma^t B(S_t) \nabla 1 \\
&= E[\gamma^t G_t \nabla \ln \pi(A_t \mid S_t; \theta)]
\end{split}</script></div></div><p><span>基线函数可以任意选择，例如：</span></p><ul><li><span>选择基线函数为由轨迹确定的随机变量 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="21.926ex" height="7.051ex" viewBox="0 -1787.9 9440.2 3035.9" role="img" focusable="false" style="vertical-align: -2.899ex;"><defs><path stroke-width="0" id="E1964-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1964-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1964-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1964-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1964-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1964-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1964-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1964-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1964-MJMATHI-3C4" d="M39 284Q18 284 18 294Q18 301 45 338T99 398Q134 425 164 429Q170 431 332 431Q492 431 497 429Q517 424 517 402Q517 388 508 376T485 360Q479 358 389 358T299 356Q298 355 283 274T251 109T233 20Q228 5 215 -4T186 -13Q153 -13 153 20V30L203 192Q214 228 227 272T248 336L254 357Q254 358 208 358Q206 358 197 358T183 359Q105 359 61 295Q56 287 53 286T39 284Z"></path><path stroke-width="0" id="E1964-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1964-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1964-MJMATHI-52" d="M230 637Q203 637 198 638T193 649Q193 676 204 682Q206 683 378 683Q550 682 564 680Q620 672 658 652T712 606T733 563T739 529Q739 484 710 445T643 385T576 351T538 338L545 333Q612 295 612 223Q612 212 607 162T602 80V71Q602 53 603 43T614 25T640 16Q668 16 686 38T712 85Q717 99 720 102T735 105Q755 105 755 93Q755 75 731 36Q693 -21 641 -21H632Q571 -21 531 4T487 82Q487 109 502 166T517 239Q517 290 474 313Q459 320 449 321T378 323H309L277 193Q244 61 244 59Q244 55 245 54T252 50T269 48T302 46H333Q339 38 339 37T336 19Q332 6 326 0H311Q275 2 180 2Q146 2 117 2T71 2T50 1Q33 1 33 10Q33 12 36 24Q41 43 46 45Q50 46 61 46H67Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628Q287 635 230 637ZM630 554Q630 586 609 608T523 636Q521 636 500 636T462 637H440Q393 637 386 627Q385 624 352 494T319 361Q319 360 388 360Q466 361 492 367Q556 377 592 426Q608 449 619 486T630 554Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1964-MJMATHI-42" x="0" y="0"></use><use xlink:href="#E1964-MJMAIN-28" x="759" y="0"></use><g transform="translate(1148,0)"><use xlink:href="#E1964-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1964-MJMAIN-29" x="2116" y="0"></use><use xlink:href="#E1964-MJMAIN-3D" x="2783" y="0"></use><use xlink:href="#E1964-MJMAIN-2212" x="3838" y="0"></use><g transform="translate(4783,0)"><use xlink:href="#E1964-MJSZ2-2211" x="0" y="0"></use><g transform="translate(87,-1088)"><use transform="scale(0.707)" xlink:href="#E1964-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMAIN-3D" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMAIN-31" x="1295" y="0"></use></g><g transform="translate(142,1150)"><use transform="scale(0.707)" xlink:href="#E1964-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMAIN-2212" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMAIN-31" x="1139" y="0"></use></g></g><g transform="translate(6394,0)"><use xlink:href="#E1964-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1964-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMAIN-2212" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMATHI-74" x="1295" y="0"></use></g></g><g transform="translate(8215,0)"><use xlink:href="#E1964-MJMATHI-52" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1964-MJMATHI-3C4" x="1073" y="-213"></use></g></g></svg></span><script type="math/tex">\displaystyle B(S_t)=-\sum_{\tau=1}^{t-1} \gamma^{\tau-t}R_\tau</script><span> ，这时有 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="34.909ex" height="6.955ex" viewBox="0 -1746.4 15030.2 2994.3" role="img" focusable="false" style="vertical-align: -2.899ex;"><defs><path stroke-width="0" id="E1965-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1965-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1965-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1965-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1965-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1965-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1965-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1965-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1965-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1965-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1965-MJMATHI-3C4" d="M39 284Q18 284 18 294Q18 301 45 338T99 398Q134 425 164 429Q170 431 332 431Q492 431 497 429Q517 424 517 402Q517 388 508 376T485 360Q479 358 389 358T299 356Q298 355 283 274T251 109T233 20Q228 5 215 -4T186 -13Q153 -13 153 20V30L203 192Q214 228 227 272T248 336L254 357Q254 358 208 358Q206 358 197 358T183 359Q105 359 61 295Q56 287 53 286T39 284Z"></path><path stroke-width="0" id="E1965-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1965-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1965-MJMAIN-221E" d="M55 217Q55 305 111 373T254 442Q342 442 419 381Q457 350 493 303L507 284L514 294Q618 442 747 442Q833 442 888 374T944 214Q944 128 889 59T743 -11Q657 -11 580 50Q542 81 506 128L492 147L485 137Q381 -11 252 -11Q166 -11 111 57T55 217ZM907 217Q907 285 869 341T761 397Q740 397 720 392T682 378T648 359T619 335T594 310T574 285T559 263T548 246L543 238L574 198Q605 158 622 138T664 94T714 61T765 51Q827 51 867 100T907 217ZM92 214Q92 145 131 89T239 33Q357 33 456 193L425 233Q364 312 334 337Q285 380 233 380Q171 380 132 331T92 214Z"></path><path stroke-width="0" id="E1965-MJMATHI-52" d="M230 637Q203 637 198 638T193 649Q193 676 204 682Q206 683 378 683Q550 682 564 680Q620 672 658 652T712 606T733 563T739 529Q739 484 710 445T643 385T576 351T538 338L545 333Q612 295 612 223Q612 212 607 162T602 80V71Q602 53 603 43T614 25T640 16Q668 16 686 38T712 85Q717 99 720 102T735 105Q755 105 755 93Q755 75 731 36Q693 -21 641 -21H632Q571 -21 531 4T487 82Q487 109 502 166T517 239Q517 290 474 313Q459 320 449 321T378 323H309L277 193Q244 61 244 59Q244 55 245 54T252 50T269 48T302 46H333Q339 38 339 37T336 19Q332 6 326 0H311Q275 2 180 2Q146 2 117 2T71 2T50 1Q33 1 33 10Q33 12 36 24Q41 43 46 45Q50 46 61 46H67Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628Q287 635 230 637ZM630 554Q630 586 609 608T523 636Q521 636 500 636T462 637H440Q393 637 386 627Q385 624 352 494T319 361Q319 360 388 360Q466 361 492 367Q556 377 592 426Q608 449 619 486T630 554Z"></path><path stroke-width="0" id="E1965-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1965-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-74" x="778" y="583"></use><use xlink:href="#E1965-MJMAIN-28" x="905" y="0"></use><g transform="translate(1294,0)"><use xlink:href="#E1965-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1965-MJMAIN-2212" x="2658" y="0"></use><use xlink:href="#E1965-MJMATHI-42" x="3658" y="0"></use><use xlink:href="#E1965-MJMAIN-28" x="4417" y="0"></use><g transform="translate(4806,0)"><use xlink:href="#E1965-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1965-MJMAIN-29" x="5774" y="0"></use><use xlink:href="#E1965-MJMAIN-29" x="6163" y="0"></use><use xlink:href="#E1965-MJMAIN-3D" x="6830" y="0"></use><g transform="translate(7886,0)"><use xlink:href="#E1965-MJSZ2-2211" x="0" y="0"></use><g transform="translate(87,-1088)"><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-3D" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-30" x="1295" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(9497,0)"><use xlink:href="#E1965-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-2B" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-74" x="1295" y="0"></use></g></g><g transform="translate(11318,0)"><use xlink:href="#E1965-MJMATHI-52" x="0" y="0"></use><g transform="translate(759,-150)"><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-2B" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMATHI-3C4" x="1139" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-2B" x="1656" y="0"></use><use transform="scale(0.707)" xlink:href="#E1965-MJMAIN-31" x="2434" y="0"></use></g></g><use xlink:href="#E1965-MJMAIN-2B" x="14252" y="0"></use></g></svg></span><script type="math/tex">\displaystyle \gamma^t(G_t-B(S_t))=\sum_{\tau=0}^{+\infty}\gamma^{\tau+t}R_{t+\tau+1}+</script><span> </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="29.063ex" height="7.051ex" viewBox="0 -1787.9 12513.1 3035.9" role="img" focusable="false" style="vertical-align: -2.899ex;"><defs><path stroke-width="0" id="E1966-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1966-MJMATHI-3C4" d="M39 284Q18 284 18 294Q18 301 45 338T99 398Q134 425 164 429Q170 431 332 431Q492 431 497 429Q517 424 517 402Q517 388 508 376T485 360Q479 358 389 358T299 356Q298 355 283 274T251 109T233 20Q228 5 215 -4T186 -13Q153 -13 153 20V30L203 192Q214 228 227 272T248 336L254 357Q254 358 208 358Q206 358 197 358T183 359Q105 359 61 295Q56 287 53 286T39 284Z"></path><path stroke-width="0" id="E1966-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1966-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1966-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1966-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1966-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1966-MJMATHI-52" d="M230 637Q203 637 198 638T193 649Q193 676 204 682Q206 683 378 683Q550 682 564 680Q620 672 658 652T712 606T733 563T739 529Q739 484 710 445T643 385T576 351T538 338L545 333Q612 295 612 223Q612 212 607 162T602 80V71Q602 53 603 43T614 25T640 16Q668 16 686 38T712 85Q717 99 720 102T735 105Q755 105 755 93Q755 75 731 36Q693 -21 641 -21H632Q571 -21 531 4T487 82Q487 109 502 166T517 239Q517 290 474 313Q459 320 449 321T378 323H309L277 193Q244 61 244 59Q244 55 245 54T252 50T269 48T302 46H333Q339 38 339 37T336 19Q332 6 326 0H311Q275 2 180 2Q146 2 117 2T71 2T50 1Q33 1 33 10Q33 12 36 24Q41 43 46 45Q50 46 61 46H67Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628Q287 635 230 637ZM630 554Q630 586 609 608T523 636Q521 636 500 636T462 637H440Q393 637 386 627Q385 624 352 494T319 361Q319 360 388 360Q466 361 492 367Q556 377 592 426Q608 449 619 486T630 554Z"></path><path stroke-width="0" id="E1966-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1966-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1966-MJMAIN-221E" d="M55 217Q55 305 111 373T254 442Q342 442 419 381Q457 350 493 303L507 284L514 294Q618 442 747 442Q833 442 888 374T944 214Q944 128 889 59T743 -11Q657 -11 580 50Q542 81 506 128L492 147L485 137Q381 -11 252 -11Q166 -11 111 57T55 217ZM907 217Q907 285 869 341T761 397Q740 397 720 392T682 378T648 359T619 335T594 310T574 285T559 263T548 246L543 238L574 198Q605 158 622 138T664 94T714 61T765 51Q827 51 867 100T907 217ZM92 214Q92 145 131 89T239 33Q357 33 456 193L425 233Q364 312 334 337Q285 380 233 380Q171 380 132 331T92 214Z"></path><path stroke-width="0" id="E1966-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1966-MJSZ2-2211" x="0" y="0"></use><g transform="translate(87,-1088)"><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-3D" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-31" x="1295" y="0"></use></g><g transform="translate(142,1150)"><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-2212" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-31" x="1139" y="0"></use></g><g transform="translate(1610,0)"><use xlink:href="#E1966-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-3C4" x="778" y="583"></use></g><g transform="translate(2626,0)"><use xlink:href="#E1966-MJMATHI-52" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-3C4" x="1073" y="-213"></use></g><use xlink:href="#E1966-MJMAIN-3D" x="4129" y="0"></use><g transform="translate(5184,0)"><use xlink:href="#E1966-MJSZ2-2211" x="0" y="0"></use><g transform="translate(87,-1088)"><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-3D" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-30" x="1295" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(6795,0)"><use xlink:href="#E1966-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-3C4" x="778" y="583"></use></g><g transform="translate(7811,0)"><use xlink:href="#E1966-MJMATHI-52" x="0" y="0"></use><g transform="translate(759,-150)"><use transform="scale(0.707)" xlink:href="#E1966-MJMATHI-3C4" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-2B" x="517" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-31" x="1295" y="0"></use></g></g><use xlink:href="#E1966-MJMAIN-3D" x="10217" y="0"></use><g transform="translate(11273,0)"><use xlink:href="#E1966-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1966-MJMAIN-30" x="1111" y="-213"></use></g></g></svg></span><script type="math/tex">\displaystyle \sum_{\tau=1}^{t-1}\gamma^\tau R_\tau=\sum_{\tau=0}^{+\infty}\gamma^\tau R_{\tau+1}=G_0</script><span> ，那么梯度的形式为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="22.603ex" height="2.71ex" viewBox="0 -832.7 9731.6 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1967-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1967-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1967-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1967-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1967-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1967-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1967-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1967-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1967-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1967-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1967-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1967-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1967-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1967-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1967-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1967-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1967-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1967-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1967-MJMAIN-5B" x="764" y="0"></use><g transform="translate(1042,0)"><use xlink:href="#E1967-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1967-MJMAIN-30" x="1111" y="-213"></use></g><use xlink:href="#E1967-MJMAIN-2207" x="2281" y="0"></use><g transform="translate(3281,0)"><use xlink:href="#E1967-MJMAIN-6C"></use><use xlink:href="#E1967-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1967-MJMATHI-3C0" x="4281" y="0"></use><use xlink:href="#E1967-MJMAIN-28" x="4854" y="0"></use><g transform="translate(5243,0)"><use xlink:href="#E1967-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1967-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1967-MJMAIN-2223" x="6626" y="0"></use><g transform="translate(7182,0)"><use xlink:href="#E1967-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1967-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1967-MJMAIN-3B" x="8150" y="0"></use><use xlink:href="#E1967-MJMATHI-3B8" x="8595" y="0"></use><use xlink:href="#E1967-MJMAIN-29" x="9064" y="0"></use><use xlink:href="#E1967-MJMAIN-5D" x="9453" y="0"></use></g></svg></span><script type="math/tex">E[G_0 \nabla \ln \pi(A_t \mid S_t; \theta)]</script><span> 。</span></li><li><span>选择基线函数为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="17.256ex" height="2.807ex" viewBox="0 -874.2 7429.5 1208.4" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1968-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1968-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1968-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1968-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1968-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1968-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1968-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1968-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1968-MJMAIN-2217" d="M229 286Q216 420 216 436Q216 454 240 464Q241 464 245 464T251 465Q263 464 273 456T283 436Q283 419 277 356T270 286L328 328Q384 369 389 372T399 375Q412 375 423 365T435 338Q435 325 425 315Q420 312 357 282T289 250L355 219L425 184Q434 175 434 161Q434 146 425 136T401 125Q393 125 383 131T328 171L270 213Q283 79 283 63Q283 53 276 44T250 35Q231 35 224 44T216 63Q216 80 222 143T229 213L171 171Q115 130 110 127Q106 124 100 124Q87 124 76 134T64 161Q64 166 64 169T67 175T72 181T81 188T94 195T113 204T138 215T170 230T210 250L74 315Q65 324 65 338Q65 353 74 363T98 374Q106 374 116 368T171 328L229 286Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1968-MJMATHI-42" x="0" y="0"></use><use xlink:href="#E1968-MJMAIN-28" x="759" y="0"></use><g transform="translate(1148,0)"><use xlink:href="#E1968-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1968-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1968-MJMAIN-29" x="2116" y="0"></use><use xlink:href="#E1968-MJMAIN-3D" x="2783" y="0"></use><g transform="translate(3838,0)"><use xlink:href="#E1968-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1968-MJMATHI-74" x="778" y="513"></use></g><g transform="translate(4744,0)"><use xlink:href="#E1968-MJMATHI-76" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1968-MJMAIN-2217" x="685" y="-213"></use></g><use xlink:href="#E1968-MJMAIN-28" x="5683" y="0"></use><g transform="translate(6072,0)"><use xlink:href="#E1968-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1968-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1968-MJMAIN-29" x="7040" y="0"></use></g></svg></span><script type="math/tex">B(S_t)=\gamma^t v_*(S_t)</script><span> ，这时梯度形式为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="35.36ex" height="2.807ex" viewBox="0 -874.2 15224.4 1208.4" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1969-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1969-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1969-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1969-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1969-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1969-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1969-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1969-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1969-MJMAIN-2217" d="M229 286Q216 420 216 436Q216 454 240 464Q241 464 245 464T251 465Q263 464 273 456T283 436Q283 419 277 356T270 286L328 328Q384 369 389 372T399 375Q412 375 423 365T435 338Q435 325 425 315Q420 312 357 282T289 250L355 219L425 184Q434 175 434 161Q434 146 425 136T401 125Q393 125 383 131T328 171L270 213Q283 79 283 63Q283 53 276 44T250 35Q231 35 224 44T216 63Q216 80 222 143T229 213L171 171Q115 130 110 127Q106 124 100 124Q87 124 76 134T64 161Q64 166 64 169T67 175T72 181T81 188T94 195T113 204T138 215T170 230T210 250L74 315Q65 324 65 338Q65 353 74 363T98 374Q106 374 116 368T171 328L229 286Z"></path><path stroke-width="0" id="E1969-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1969-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1969-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1969-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1969-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1969-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1969-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1969-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1969-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1969-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1969-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1969-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1969-MJMAIN-5B" x="764" y="0"></use><g transform="translate(1042,0)"><use xlink:href="#E1969-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1969-MJMATHI-74" x="778" y="513"></use></g><use xlink:href="#E1969-MJMAIN-28" x="1947" y="0"></use><g transform="translate(2336,0)"><use xlink:href="#E1969-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1969-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1969-MJMAIN-2212" x="3700" y="0"></use><g transform="translate(4700,0)"><use xlink:href="#E1969-MJMATHI-76" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1969-MJMAIN-2217" x="685" y="-213"></use></g><use xlink:href="#E1969-MJMAIN-28" x="5639" y="0"></use><g transform="translate(6028,0)"><use xlink:href="#E1969-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1969-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1969-MJMAIN-29" x="6996" y="0"></use><use xlink:href="#E1969-MJMAIN-29" x="7385" y="0"></use><use xlink:href="#E1969-MJMAIN-2207" x="7774" y="0"></use><g transform="translate(8774,0)"><use xlink:href="#E1969-MJMAIN-6C"></use><use xlink:href="#E1969-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1969-MJMATHI-3C0" x="9774" y="0"></use><use xlink:href="#E1969-MJMAIN-28" x="10347" y="0"></use><g transform="translate(10736,0)"><use xlink:href="#E1969-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1969-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1969-MJMAIN-2223" x="12119" y="0"></use><g transform="translate(12675,0)"><use xlink:href="#E1969-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1969-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1969-MJMAIN-3B" x="13643" y="0"></use><use xlink:href="#E1969-MJMATHI-3B8" x="14088" y="0"></use><use xlink:href="#E1969-MJMAIN-29" x="14557" y="0"></use><use xlink:href="#E1969-MJMAIN-5D" x="14946" y="0"></use></g></svg></span><script type="math/tex">E[\gamma^t(G_t-v_*(S_t))\nabla \ln \pi(A_t \mid S_t; \theta)]</script><span> 。</span></li></ul><p><span>但在实际选择基线时，应当参照以下两点：</span></p><ul><li><span>基线的选择应当有效降低方差。但能不能降低方差不容易在理论上判别，往往需要通过实践获知。</span></li><li><span>基线函数应当是可以得到的。例如虽然不知道最优价值函数，但是可以得到最优价值函数的估计。</span></li></ul><p><span>一个能有效降低方差的基线是状态价值函数的估计，其对应的算法如下：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n38" cid="n38" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-828-Frame" tabindex="-1" style="font-size: 100%; display: inline-block; zoom: 0.935471;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="105.178ex" height="39.849ex" viewBox="-18.1 -43.5 45284.6 17156.9" role="img" focusable="false" style="vertical-align: -39.747ex; margin-left: -0.042ex; max-width: 100%;"><defs><path stroke-width="0" id="E1930-MJMAINB-37" d="M256 -11Q231 -11 208 5T185 65Q185 105 193 146T212 220T241 289T275 349T312 402T346 445T377 479T397 502L400 504H301Q156 503 150 497Q142 491 134 456T126 407H64V411Q65 414 82 544T99 675T130 676H161V673Q161 669 162 666T167 661T173 657T181 654T190 652T200 651T210 650T220 649T229 648Q237 648 254 647T276 646Q277 646 426 644H558V620V607Q558 596 551 586T509 537Q489 515 476 500Q390 401 384 393Q349 339 337 259T324 113T322 38Q307 -11 256 -11Z"></path><path stroke-width="0" id="E1930-MJMAINB-2D" d="M13 166V278H318V166H13Z"></path><path stroke-width="0" id="E1930-MJMAINB-32" d="M175 580Q175 578 185 572T205 551T215 510Q215 467 191 449T137 430Q107 430 83 448T58 511Q58 558 91 592T168 640T259 654Q328 654 383 637Q451 610 484 563T517 459Q517 401 482 360T368 262Q340 243 265 184L210 140H274Q416 140 429 145Q439 148 447 186T455 237H517V233Q516 230 501 119Q489 9 486 4V0H57V25Q57 51 58 54Q60 57 109 106T215 214T288 291Q364 377 364 458Q364 515 328 553T231 592Q214 592 201 589T181 584T175 580Z"></path><path stroke-width="0" id="E1930-MJMAIN-22EF" d="M78 250Q78 274 95 292T138 310Q162 310 180 294T199 251Q199 226 182 208T139 190T96 207T78 250ZM525 250Q525 274 542 292T585 310Q609 310 627 294T646 251Q646 226 629 208T586 190T543 207T525 250ZM972 250Q972 274 989 292T1032 310Q1056 310 1074 294T1093 251Q1093 226 1076 208T1033 190T990 207T972 250Z"></path><path stroke-width="0" id="E1930-MJMAIN-37" d="M55 458Q56 460 72 567L88 674Q88 676 108 676H128V672Q128 662 143 655T195 646T364 644H485V605L417 512Q408 500 387 472T360 435T339 403T319 367T305 330T292 284T284 230T278 162T275 80Q275 66 275 52T274 28V19Q270 2 255 -10T221 -22Q210 -22 200 -19T179 0T168 40Q168 198 265 368Q285 400 349 489L395 552H302Q128 552 119 546Q113 543 108 522T98 479L95 458V455H55V458Z"></path><path stroke-width="0" id="E1930-MJMAIN-2D" d="M11 179V252H277V179H11Z"></path><path stroke-width="0" id="E1930-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1930-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E1930-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1930-MJMAINB-77" d="M624 444Q636 441 722 441Q797 441 800 444H805V382H741L593 11Q592 10 590 8T586 4T584 2T581 0T579 -2T575 -3T571 -3T567 -4T561 -4T553 -4H542Q525 -4 518 6T490 70Q474 110 463 137L415 257L367 137Q357 111 341 72Q320 17 313 7T289 -4H277Q259 -4 253 -2T238 11L90 382H25V444H32Q47 441 140 441Q243 441 261 444H270V382H222L310 164L382 342L366 382H303V444H310Q322 441 407 441Q508 441 523 444H531V382H506Q481 382 481 380Q482 376 529 259T577 142L674 382H617V444H624Z"></path><path stroke-width="0" id="E1930-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1930-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1930-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1930-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1930-MJMAIN-2E" d="M78 60Q78 84 95 102T138 120Q162 120 180 104T199 61Q199 36 182 18T139 0T96 17T78 60Z"></path><path stroke-width="0" id="E1930-MJMAIN-2190" d="M944 261T944 250T929 230H165Q167 228 182 216T211 189T244 152T277 96T303 25Q308 7 308 0Q308 -11 288 -11Q281 -11 278 -11T272 -7T267 2T263 21Q245 94 195 151T73 236Q58 242 55 247Q55 254 59 257T73 264Q121 283 158 314T215 375T247 434T264 480L267 497Q269 503 270 505T275 509T288 511Q308 511 308 500Q308 493 303 475Q293 438 278 406T246 352T215 315T185 287T165 270H929Q944 261 944 250Z"></path><path stroke-width="0" id="E1930-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1930-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="0" id="E1930-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1930-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1930-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1930-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path stroke-width="0" id="E1930-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1930-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1930-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1930-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1930-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1930-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1930-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1930-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1930-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1930-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1930-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(9267,-2462)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">算</text><g transform="translate(1052,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">法</text></g><use transform="scale(1.2)" xlink:href="#E1930-MJMAINB-37" x="1963" y="0"></use><use transform="scale(1.2)" xlink:href="#E1930-MJMAINB-2D" x="2538" y="0"></use><use transform="scale(1.2)" xlink:href="#E1930-MJMAINB-32" x="2921" y="0"></use><g transform="translate(4945,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">带</text></g><g transform="translate(5998,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">基</text></g><g transform="translate(7050,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">线</text></g><g transform="translate(8067,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">的</text></g><g transform="translate(9083,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">简</text></g><g transform="translate(10099,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">单</text></g><g transform="translate(11152,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(12205,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(13258,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">梯</text></g><g transform="translate(14311,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">度</text></g><g transform="translate(15363,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">算</text></g><g transform="translate(16416,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">法</text></g><g transform="translate(17469,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">求</text></g><g transform="translate(18522,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">解</text></g><g transform="translate(19575,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">最</text></g><g transform="translate(20628,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">优</text></g><g transform="translate(21681,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(22733,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">略</text></g></g><g transform="translate(0,-9485)"><g transform="translate(-19,0)"><g transform="translate(0,5625)"><g><rect fill="black" stroke="none" width="1569" height="100" x="0" y="500"></rect></g></g><g transform="translate(0,-5426)"><g><rect fill="black" stroke="none" width="1569" height="100" x="0" y="-500"></rect></g></g></g><g transform="translate(1551,0)"><g transform="translate(0,5625)"><g><rect fill="black" stroke="none" width="41597" height="100" x="0" y="500"></rect></g></g><g transform="translate(0,4325)"><use xlink:href="#E1930-MJMAIN-22EF" x="166" y="0"></use><g transform="translate(2505,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">同</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">算</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">法</text></g><use xlink:href="#E1930-MJMAIN-37" x="2741" y="0"></use><use xlink:href="#E1930-MJMAIN-2D" x="3241" y="0"></use><use xlink:href="#E1930-MJMAIN-31" x="3574" y="0"></use></g><use xlink:href="#E1930-MJMAIN-22EF" x="7746" y="0"></use></g><g transform="translate(0,2882)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">参</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">：</text></g><g transform="translate(2491,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">优</text></g><g transform="translate(3322,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">化</text></g><g transform="translate(4153,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">器</text></g><g transform="translate(4983,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(5814,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">隐</text></g><g transform="translate(6645,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">含</text></g><g transform="translate(7475,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">学</text></g><g transform="translate(8306,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">习</text></g><g transform="translate(9137,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">率</text></g><g transform="translate(10217,0)"><use xlink:href="#E1930-MJMATHI-3B1" x="0" y="0"></use><g transform="translate(640,412)"><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-28" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAINB-77" x="389" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-29" x="1220" y="0"></use></g><use xlink:href="#E1930-MJMAIN-2C" x="1877" y="0"></use><g transform="translate(2322,0)"><use xlink:href="#E1930-MJMATHI-3B1" x="0" y="0"></use><g transform="translate(640,412)"><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-28" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-3B8" x="389" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-29" x="858" y="0"></use></g></g></g><g transform="translate(14161,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">折</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">扣</text></g><g transform="translate(3572,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">因</text></g><g transform="translate(4403,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">子</text></g></g><use xlink:href="#E1930-MJMATHI-3B3" x="19645" y="0"></use><g transform="translate(20188,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">控</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">制</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(3572,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">合</text></g><g transform="translate(4403,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(5233,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">和</text></g><g transform="translate(6064,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">回</text></g><g transform="translate(6895,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">合</text></g><g transform="translate(7725,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">内</text></g><g transform="translate(8556,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">步</text></g><g transform="translate(9387,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(10217,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">的</text></g><g transform="translate(11048,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">参</text></g><g transform="translate(11879,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">数</text></g><g transform="translate(12709,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g><g transform="translate(0,1566)"><use xlink:href="#E1930-MJMAIN-31"></use><use xlink:href="#E1930-MJMAIN-2E" x="500" y="0"></use><g transform="translate(778,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(1608,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">初</text></g><g transform="translate(2439,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">始</text></g><g transform="translate(3269,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">化</text></g><g transform="translate(4100,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(4931,0)"><use xlink:href="#E1930-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E1930-MJMAIN-2190" x="746" y="0"></use></g><g transform="translate(6678,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">任</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">意</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">值</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g></g><g transform="translate(10250,0)"><use xlink:href="#E1930-MJMAINB-77" x="0" y="0"></use><use xlink:href="#E1930-MJMAIN-2190" x="1108" y="0"></use></g><g transform="translate(12359,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">任</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">意</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">值</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g><g transform="translate(0,266)"><use xlink:href="#E1930-MJMAIN-22EF" x="166" y="0"></use><g transform="translate(2505,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">同</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">算</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">法</text></g><use xlink:href="#E1930-MJMAIN-37" x="2741" y="0"></use><use xlink:href="#E1930-MJMAIN-2D" x="3241" y="0"></use><use xlink:href="#E1930-MJMAIN-31" x="3574" y="0"></use></g><use xlink:href="#E1930-MJMAIN-22EF" x="7746" y="0"></use></g><g transform="translate(0,-1177)"><g transform="translate(4000,0)"><use xlink:href="#E1930-MJMAIN-32"></use><use xlink:href="#E1930-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1930-MJMAIN-33" x="778" y="0"></use><use xlink:href="#E1930-MJMAIN-2E" x="1278" y="0"></use><use xlink:href="#E1930-MJMAIN-32" x="1556" y="0"></use><g transform="translate(2056,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2886,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(3717,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><g transform="translate(4547,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">价</text></g><g transform="translate(5378,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">值</text></g><g transform="translate(6209,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(7039,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(7870,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><use xlink:href="#E1930-MJMAINB-77" x="8951" y="0"></use><g transform="translate(9782,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">以</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">减</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">小</text></g></g><g transform="translate(12774,0)"><use xlink:href="#E1930-MJMAIN-5B" x="0" y="0"></use><use xlink:href="#E1930-MJMATHI-47" x="278" y="0"></use><use xlink:href="#E1930-MJMAIN-2212" x="1286" y="0"></use><use xlink:href="#E1930-MJMATHI-76" x="2286" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="2771" y="0"></use><g transform="translate(3160,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="4128" y="0"></use><use xlink:href="#E1930-MJMAINB-77" x="4573" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="5404" y="0"></use><g transform="translate(5793,0)"><use xlink:href="#E1930-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-32" x="393" y="583"></use></g></g><g transform="translate(19299,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">如</text></g></g><g transform="translate(21460,0)"><use xlink:href="#E1930-MJMAINB-77" x="0" y="0"></use><use xlink:href="#E1930-MJMAIN-2190" x="1108" y="0"></use><use xlink:href="#E1930-MJMAINB-77" x="2386" y="0"></use><use xlink:href="#E1930-MJMAIN-2B" x="3439" y="0"></use><g transform="translate(4440,0)"><use xlink:href="#E1930-MJMATHI-3B1" x="0" y="0"></use><g transform="translate(640,412)"><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-28" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAINB-77" x="389" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-29" x="1220" y="0"></use></g></g><use xlink:href="#E1930-MJMAIN-5B" x="6317" y="0"></use><use xlink:href="#E1930-MJMATHI-47" x="6595" y="0"></use><use xlink:href="#E1930-MJMAIN-2212" x="7603" y="0"></use><use xlink:href="#E1930-MJMATHI-76" x="8604" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="9089" y="0"></use><g transform="translate(9478,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="10446" y="0"></use><use xlink:href="#E1930-MJMAINB-77" x="10891" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="11722" y="0"></use><use xlink:href="#E1930-MJMAIN-5D" x="12111" y="0"></use><use xlink:href="#E1930-MJMAIN-2207" x="12389" y="0"></use><use xlink:href="#E1930-MJMATHI-76" x="13222" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="13707" y="0"></use><g transform="translate(14096,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="15064" y="0"></use><use xlink:href="#E1930-MJMAINB-77" x="15509" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="16340" y="0"></use></g><g transform="translate(38189,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">；</text></g></g></g></g><g transform="translate(0,-2583)"><g transform="translate(4000,0)"><use xlink:href="#E1930-MJMAIN-32"></use><use xlink:href="#E1930-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1930-MJMAIN-33" x="778" y="0"></use><use xlink:href="#E1930-MJMAIN-2E" x="1278" y="0"></use><use xlink:href="#E1930-MJMAIN-33" x="1556" y="0"></use><g transform="translate(2056,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2886,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(3717,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><g transform="translate(4547,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(5378,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(6209,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(7039,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text></g><g transform="translate(7870,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g><use xlink:href="#E1930-MJMATHI-3B8" x="8951" y="0"></use><g transform="translate(9420,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">以</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">减</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">小</text></g></g><g transform="translate(12412,0)"><use xlink:href="#E1930-MJMAIN-2212" x="0" y="0"></use><g transform="translate(778,0)"><use xlink:href="#E1930-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1930-MJMAIN-5B" x="1683" y="0"></use><use xlink:href="#E1930-MJMATHI-47" x="1961" y="0"></use><use xlink:href="#E1930-MJMAIN-2212" x="2970" y="0"></use><use xlink:href="#E1930-MJMATHI-76" x="3970" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="4455" y="0"></use><g transform="translate(4844,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="5812" y="0"></use><use xlink:href="#E1930-MJMAINB-77" x="6257" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="7088" y="0"></use><use xlink:href="#E1930-MJMAIN-5D" x="7477" y="0"></use><g transform="translate(7921,0)"><use xlink:href="#E1930-MJMAIN-6C"></use><use xlink:href="#E1930-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1930-MJMATHI-3C0" x="8922" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="9495" y="0"></use><g transform="translate(9884,0)"><use xlink:href="#E1930-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-2223" x="11267" y="0"></use><g transform="translate(11823,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="12791" y="0"></use><use xlink:href="#E1930-MJMATHI-3B8" x="13236" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="13705" y="0"></use></g><g transform="translate(26506,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g></g></g></g><g transform="translate(0,-4076)"><g transform="translate(6444,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">如</text><g transform="translate(1080,0)"><use xlink:href="#E1930-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E1930-MJMAIN-2190" x="746" y="0"></use><use xlink:href="#E1930-MJMATHI-3B8" x="2024" y="0"></use><use xlink:href="#E1930-MJMAIN-2B" x="2715" y="0"></use><g transform="translate(3716,0)"><use xlink:href="#E1930-MJMATHI-3B1" x="0" y="0"></use><g transform="translate(640,412)"><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-28" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-3B8" x="389" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMAIN-29" x="858" y="0"></use></g></g><g transform="translate(5337,0)"><use xlink:href="#E1930-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1930-MJMAIN-5B" x="6243" y="0"></use><use xlink:href="#E1930-MJMATHI-47" x="6521" y="0"></use><use xlink:href="#E1930-MJMAIN-2212" x="7529" y="0"></use><use xlink:href="#E1930-MJMATHI-76" x="8530" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="9015" y="0"></use><g transform="translate(9404,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="10372" y="0"></use><use xlink:href="#E1930-MJMAINB-77" x="10816" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="11647" y="0"></use><use xlink:href="#E1930-MJMAIN-5D" x="12036" y="0"></use><use xlink:href="#E1930-MJMAIN-2207" x="12314" y="0"></use><g transform="translate(13314,0)"><use xlink:href="#E1930-MJMAIN-6C"></use><use xlink:href="#E1930-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1930-MJMATHI-3C0" x="14315" y="0"></use><use xlink:href="#E1930-MJMAIN-28" x="14888" y="0"></use><g transform="translate(15277,0)"><use xlink:href="#E1930-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-2223" x="16660" y="0"></use><g transform="translate(17216,0)"><use xlink:href="#E1930-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1930-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1930-MJMAIN-3B" x="18184" y="0"></use><use xlink:href="#E1930-MJMATHI-3B8" x="18629" y="0"></use><use xlink:href="#E1930-MJMAIN-29" x="19098" y="0"></use></g><g transform="translate(20567,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g></g><g transform="translate(0,-5426)"><g><rect fill="black" stroke="none" width="41597" height="100" x="0" y="-500"></rect></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-828">\; \\ \; \\
\large \textbf{算法 7-2   带基线的简单策略梯度算法求解最优策略} \\
\begin{split}
\rule[5pt]{10mm}{0.1em} &\rule[5pt]{265mm}{0.1em} \\
&\cdots \quad \text{同算法 7-1} \quad \cdots \\
&\text{参数：优化器（隐含学习率 $\alpha^{(\bold w)}, \alpha^{(\theta)}$ ），折扣因子 $\gamma$ ，控制回合数和回合内步数的参数。} \\
&\text{1.（初始化）$\theta \leftarrow$ 任意值，$\bold w \leftarrow$ 任意值。} \\
&\cdots \quad \text{同算法 7-1} \quad \cdots \\
&\qquad \qquad \text{2.3.2（更新价值）更新 $\bold w$ 以减小 $[G-v(S_t;\bold w)]^2$ ，如 $\bold w \leftarrow \bold w + \alpha^{(\bold w)}[G-v(S_t;\bold w)]\nabla v(S_t;\bold w)$ ；} \\
&\qquad \qquad \text{2.3.3（更新策略）更新 $\theta$ 以减小 $-\gamma^t [G-v(S_t;\bold w)] \ln \pi(A_t \mid S_t; \theta)$ ，} \\
&\qquad \qquad \qquad \;\, \text{如 $\theta \leftarrow \theta + \alpha^{(\theta)}\gamma^t [G-v(S_t;\bold w)] \nabla \ln \pi(A_t \mid S_t; \theta)$ 。} \\
\rule[-5pt]{10mm}{0.1em} &\rule[-5pt]{265mm}{0.1em}
\end{split}
\; \\ \; \\</script></div></div><p><span>下面分析一下什么样的基线函数能最大程度地减小方差。考虑 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="34.943ex" height="2.807ex" viewBox="0 -874.2 15044.9 1208.4" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1970-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1970-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1970-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1970-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1970-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1970-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1970-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1970-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1970-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1970-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1970-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1970-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1970-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1970-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1970-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1970-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1970-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1970-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1970-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1970-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1970-MJMAIN-5B" x="764" y="0"></use><g transform="translate(1042,0)"><use xlink:href="#E1970-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1970-MJMATHI-74" x="778" y="513"></use></g><use xlink:href="#E1970-MJMAIN-28" x="1947" y="0"></use><g transform="translate(2336,0)"><use xlink:href="#E1970-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1970-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1970-MJMAIN-2212" x="3700" y="0"></use><use xlink:href="#E1970-MJMATHI-42" x="4700" y="0"></use><use xlink:href="#E1970-MJMAIN-28" x="5459" y="0"></use><g transform="translate(5848,0)"><use xlink:href="#E1970-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1970-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1970-MJMAIN-29" x="6816" y="0"></use><use xlink:href="#E1970-MJMAIN-29" x="7205" y="0"></use><use xlink:href="#E1970-MJMAIN-2207" x="7594" y="0"></use><g transform="translate(8594,0)"><use xlink:href="#E1970-MJMAIN-6C"></use><use xlink:href="#E1970-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1970-MJMATHI-3C0" x="9595" y="0"></use><use xlink:href="#E1970-MJMAIN-28" x="10168" y="0"></use><g transform="translate(10557,0)"><use xlink:href="#E1970-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1970-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1970-MJMAIN-2223" x="11940" y="0"></use><g transform="translate(12495,0)"><use xlink:href="#E1970-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1970-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1970-MJMAIN-3B" x="13464" y="0"></use><use xlink:href="#E1970-MJMATHI-3B8" x="13908" y="0"></use><use xlink:href="#E1970-MJMAIN-29" x="14377" y="0"></use><use xlink:href="#E1970-MJMAIN-5D" x="14766" y="0"></use></g></svg></span><script type="math/tex">E[\gamma^t(G_t-B(S_t))\nabla\ln\pi(A_t \mid S_t;\theta)]</script><span> 的方差，并对 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="5.819ex" height="2.71ex" viewBox="0 -832.7 2505.3 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1971-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1971-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1971-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1971-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1971-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1971-MJMATHI-42" x="0" y="0"></use><use xlink:href="#E1971-MJMAIN-28" x="759" y="0"></use><g transform="translate(1148,0)"><use xlink:href="#E1971-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1971-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1971-MJMAIN-29" x="2116" y="0"></use></g></svg></span><script type="math/tex">B(S_t)</script><span> 求偏导有：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n40" cid="n40" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-829-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="90.572ex" height="25.283ex" viewBox="0 -5692 38996.2 10885.5" role="img" focusable="false" style="vertical-align: -12.063ex; max-width: 100%;"><defs><path stroke-width="0" id="E1931-MJMAIN-2202" d="M202 508Q179 508 169 520T158 547Q158 557 164 577T185 624T230 675T301 710L333 715H345Q378 715 384 714Q447 703 489 661T549 568T566 457Q566 362 519 240T402 53Q321 -22 223 -22Q123 -22 73 56Q42 102 42 148V159Q42 276 129 370T322 465Q383 465 414 434T455 367L458 378Q478 461 478 515Q478 603 437 639T344 676Q266 676 223 612Q264 606 264 572Q264 547 246 528T202 508ZM430 306Q430 372 401 400T333 428Q270 428 222 382Q197 354 183 323T150 221Q132 149 132 116Q132 21 232 21Q244 21 250 22Q327 35 374 112Q389 137 409 196T430 306Z"></path><path stroke-width="0" id="E1931-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1931-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1931-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1931-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1931-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1931-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1931-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1931-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1931-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1931-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1931-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1931-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1931-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1931-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1931-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1931-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1931-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1931-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1931-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1931-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1931-MJSZ1-5B" d="M202 -349V850H394V810H242V-309H394V-349H202Z"></path><path stroke-width="0" id="E1931-MJSZ1-5D" d="M22 810V850H214V-349H22V-309H174V810H22Z"></path><path stroke-width="0" id="E1931-MJSZ2-28" d="M180 96T180 250T205 541T266 770T353 944T444 1069T527 1150H555Q561 1144 561 1141Q561 1137 545 1120T504 1072T447 995T386 878T330 721T288 513T272 251Q272 133 280 56Q293 -87 326 -209T399 -405T475 -531T536 -609T561 -640Q561 -643 555 -649H527Q483 -612 443 -568T353 -443T266 -270T205 -41Z"></path><path stroke-width="0" id="E1931-MJSZ2-29" d="M35 1138Q35 1150 51 1150H56H69Q113 1113 153 1069T243 944T330 771T391 541T416 250T391 -40T330 -270T243 -443T152 -568T69 -649H56Q43 -649 39 -647T35 -637Q65 -607 110 -548Q283 -316 316 56Q324 133 324 251Q324 368 316 445Q278 877 48 1123Q36 1137 35 1138Z"></path><path stroke-width="0" id="E1931-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1931-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1931-MJSZ3-5B" d="M247 -949V1450H516V1388H309V-887H516V-949H247Z"></path><path stroke-width="0" id="E1931-MJSZ3-5D" d="M11 1388V1450H280V-949H11V-887H218V1388H11Z"></path><path stroke-width="0" id="E1931-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,4226)"><g transform="translate(120,0)"><rect stroke="none" width="3192" height="60" x="0" y="220"></rect><use xlink:href="#E1931-MJMAIN-2202" x="1312" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1931-MJMAIN-2202" x="0" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="567" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="1326" y="0"></use><g transform="translate(1715,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="2683" y="0"></use></g></g><g transform="translate(3432,0)"><use xlink:href="#E1931-MJSZ2-28"></use><use xlink:href="#E1931-MJMATHI-45" x="597" y="0"></use><g transform="translate(1527,0)"><use xlink:href="#E1931-MJSZ1-5B"></use><use xlink:href="#E1931-MJMAIN-5B" x="417" y="0"></use><g transform="translate(695,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1931-MJMAIN-28" x="1600" y="0"></use><g transform="translate(1989,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="3353" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="4353" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="5112" y="0"></use><g transform="translate(5501,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="6469" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="6858" y="0"></use><use xlink:href="#E1931-MJMAIN-2207" x="7247" y="0"></use><g transform="translate(8247,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="9248" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="9821" y="0"></use><g transform="translate(10210,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="11593" y="0"></use><g transform="translate(12148,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="13117" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="13561" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="14030" y="0"></use><g transform="translate(14419,0)"><use xlink:href="#E1931-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1931-MJSZ1-5D" x="15151" y="-1"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="17318" y="0"></use><g transform="translate(18318,0)"><use xlink:href="#E1931-MJSZ1-5B"></use><use xlink:href="#E1931-MJMATHI-45" x="417" y="0"></use><use xlink:href="#E1931-MJMAIN-5B" x="1181" y="0"></use><g transform="translate(1459,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1931-MJMAIN-28" x="2364" y="0"></use><g transform="translate(2753,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="4117" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="5117" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="5876" y="0"></use><g transform="translate(6265,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="7233" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="7622" y="0"></use><use xlink:href="#E1931-MJMAIN-2207" x="8011" y="0"></use><g transform="translate(9011,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="10012" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="10585" y="0"></use><g transform="translate(10974,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="12357" y="0"></use><g transform="translate(12912,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="13881" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="14325" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="14794" y="0"></use><use xlink:href="#E1931-MJMAIN-5D" x="15183" y="0"></use><use xlink:href="#E1931-MJSZ1-5D" x="15461" y="-1"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="22456" y="823"></use></g><use xlink:href="#E1931-MJSZ2-29" x="34650" y="-1"></use></g></g><g transform="translate(0,1591)"><use xlink:href="#E1931-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1055,0)"><g transform="translate(397,0)"><rect stroke="none" width="3192" height="60" x="0" y="220"></rect><use xlink:href="#E1931-MJMAIN-2202" x="1312" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1931-MJMAIN-2202" x="0" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="567" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="1326" y="0"></use><g transform="translate(1715,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="2683" y="0"></use></g></g></g><use xlink:href="#E1931-MJMATHI-45" x="4765" y="0"></use><g transform="translate(5696,0)"><use xlink:href="#E1931-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="500" y="0"></use></g></g><use xlink:href="#E1931-MJMAIN-28" x="1676" y="0"></use><g transform="translate(2065,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="3428" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="4429" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="5188" y="0"></use><g transform="translate(5577,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="6545" y="0"></use><g transform="translate(6934,0)"><use xlink:href="#E1931-MJMAIN-29" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="550" y="583"></use></g><use xlink:href="#E1931-MJMAIN-5B" x="7776" y="0"></use><use xlink:href="#E1931-MJMAIN-2207" x="8054" y="0"></use><g transform="translate(9054,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="10055" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="10628" y="0"></use><g transform="translate(11017,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="12400" y="0"></use><g transform="translate(12956,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="13924" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="14368" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="14837" y="0"></use><g transform="translate(15226,0)"><use xlink:href="#E1931-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1931-MJSZ1-5D" x="15958" y="-1"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="22294" y="0"></use><g transform="translate(23072,0)"><g transform="translate(342,0)"><rect stroke="none" width="3192" height="60" x="0" y="220"></rect><use xlink:href="#E1931-MJMAIN-2202" x="1312" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1931-MJMAIN-2202" x="0" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="567" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="1326" y="0"></use><g transform="translate(1715,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="2683" y="0"></use></g></g></g><g transform="translate(26726,0)"><use xlink:href="#E1931-MJSZ1-5B"></use><use xlink:href="#E1931-MJMATHI-45" x="417" y="0"></use><use xlink:href="#E1931-MJMAIN-5B" x="1181" y="0"></use><g transform="translate(1459,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(2364,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2207" x="3506" y="0"></use><g transform="translate(4505,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="5506" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="6079" y="0"></use><g transform="translate(6468,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="7851" y="0"></use><g transform="translate(8407,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="9375" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="9820" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="10289" y="0"></use><use xlink:href="#E1931-MJMAIN-5D" x="10678" y="0"></use><use xlink:href="#E1931-MJSZ1-5D" x="10956" y="-1"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="16084" y="823"></use></g></g><g transform="translate(0,-1103)"><use xlink:href="#E1931-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1931-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1931-MJSZ3-5B"></use><g transform="translate(528,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="500" y="0"></use></g></g><g transform="translate(1787,0)"><g transform="translate(120,0)"><rect stroke="none" width="3192" height="60" x="0" y="220"></rect><use xlink:href="#E1931-MJMAIN-2202" x="1312" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1931-MJMAIN-2202" x="0" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="567" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="1326" y="0"></use><g transform="translate(1715,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="2683" y="0"></use></g></g></g><use xlink:href="#E1931-MJMAIN-28" x="5219" y="0"></use><g transform="translate(5608,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="1111" y="487"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-394"></use></g><use xlink:href="#E1931-MJMAIN-2B" x="7070" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="8070" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="8459" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="9218" y="0"></use><g transform="translate(9607,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="10575" y="0"></use><g transform="translate(10964,0)"><use xlink:href="#E1931-MJMAIN-29" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="550" y="583"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="12029" y="0"></use><use xlink:href="#E1931-MJMAIN-32" x="13029" y="0"></use><g transform="translate(13529,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMATHI-42" x="14671" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="15430" y="0"></use><g transform="translate(15819,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="16787" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="17176" y="0"></use><use xlink:href="#E1931-MJMAIN-5B" x="17565" y="0"></use><use xlink:href="#E1931-MJMAIN-2207" x="17843" y="0"></use><g transform="translate(18843,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="19843" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="20416" y="0"></use><g transform="translate(20805,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="22188" y="0"></use><g transform="translate(22744,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="23712" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="24157" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="24626" y="0"></use><g transform="translate(25015,0)"><use xlink:href="#E1931-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1931-MJSZ3-5D" x="25747" y="-1"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="28761" y="0"></use><use xlink:href="#E1931-MJMAIN-30" x="29761" y="0"></use></g><g transform="translate(0,-3236)"><use xlink:href="#E1931-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1931-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1931-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="500" y="0"></use></g></g><use xlink:href="#E1931-MJMAIN-28" x="1676" y="0"></use><use xlink:href="#E1931-MJMAIN-30" x="2065" y="0"></use><use xlink:href="#E1931-MJMAIN-2B" x="2787" y="0"></use><use xlink:href="#E1931-MJMAIN-32" x="3787" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="4287" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="5046" y="0"></use><g transform="translate(5435,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="6404" y="0"></use><use xlink:href="#E1931-MJMAIN-2212" x="7015" y="0"></use><use xlink:href="#E1931-MJMAIN-32" x="8015" y="0"></use><g transform="translate(8515,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="9656" y="0"></use><use xlink:href="#E1931-MJMAIN-5B" x="10045" y="0"></use><use xlink:href="#E1931-MJMAIN-2207" x="10323" y="0"></use><g transform="translate(11323,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="12324" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="12897" y="0"></use><g transform="translate(13286,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="14669" y="0"></use><g transform="translate(15224,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="16193" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="16637" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="17106" y="0"></use><g transform="translate(17495,0)"><use xlink:href="#E1931-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1931-MJSZ1-5D" x="18227" y="-1"></use></g></g><g transform="translate(0,-4769)"><use xlink:href="#E1931-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1931-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1931-MJSZ1-5B"></use><use xlink:href="#E1931-MJMAIN-2212" x="417" y="0"></use><use xlink:href="#E1931-MJMAIN-32" x="1195" y="0"></use><g transform="translate(1695,0)"><use xlink:href="#E1931-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="500" y="0"></use></g></g><use xlink:href="#E1931-MJMAIN-28" x="2954" y="0"></use><g transform="translate(3343,0)"><use xlink:href="#E1931-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2212" x="4706" y="0"></use><use xlink:href="#E1931-MJMATHI-42" x="5707" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="6466" y="0"></use><g transform="translate(6855,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-29" x="7823" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="8212" y="0"></use><use xlink:href="#E1931-MJMAIN-5B" x="8601" y="0"></use><use xlink:href="#E1931-MJMAIN-2207" x="8879" y="0"></use><g transform="translate(9879,0)"><use xlink:href="#E1931-MJMAIN-6C"></use><use xlink:href="#E1931-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1931-MJMATHI-3C0" x="10879" y="0"></use><use xlink:href="#E1931-MJMAIN-28" x="11452" y="0"></use><g transform="translate(11841,0)"><use xlink:href="#E1931-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-2223" x="13224" y="0"></use><g transform="translate(13780,0)"><use xlink:href="#E1931-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1931-MJMAIN-3B" x="14748" y="0"></use><use xlink:href="#E1931-MJMATHI-3B8" x="15193" y="0"></use><use xlink:href="#E1931-MJMAIN-29" x="15662" y="0"></use><g transform="translate(16051,0)"><use xlink:href="#E1931-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1931-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1931-MJSZ1-5D" x="16782" y="-1"></use></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-829">\begin{split}
& \frac{\partial}{\partial B(S_t)} \left( E\left[[\gamma^t(G_t-B(S_t))\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] - \left[E[\gamma^t(G_t-B(S_t))\nabla\ln\pi(A_t \mid S_t;\theta)]\right]^2 \right) \\
&= \frac{\partial}{\partial B(S_t)} E\left[\gamma^{2t}(G_t-B(S_t))^2[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] - \frac{\partial}{\partial B(S_t)} \left[E[\gamma^t G_t \nabla\ln\pi(A_t \mid S_t;\theta)]\right]^2 \\
&= E\left[\gamma^{2t} \frac{\partial}{\partial B(S_t)} (G_t^2+(B(S_t))^2-2G_tB(S_t))[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] - 0 \\
&= E\left[\gamma^{2t} (0+2B(S_t)-2G_t)[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] \\
&= E\left[-2\gamma^{2t}(G_t-B(S_t))[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right]
\end{split}</script></div></div><p><span>假设 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="64.003ex" height="3.096ex" viewBox="0 -915.7 27556.7 1333" role="img" focusable="false" style="vertical-align: -0.969ex;"><defs><path stroke-width="0" id="E1972-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1972-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1972-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1972-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1972-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1972-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1972-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1972-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1972-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1972-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1972-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1972-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1972-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1972-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1972-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1972-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1972-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1972-MJSZ1-5B" d="M202 -349V850H394V810H242V-309H394V-349H202Z"></path><path stroke-width="0" id="E1972-MJSZ1-5D" d="M22 810V850H214V-349H22V-309H174V810H22Z"></path><path stroke-width="0" id="E1972-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1972-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1972-MJSZ1-5B"></use><use xlink:href="#E1972-MJMATHI-42" x="417" y="0"></use><use xlink:href="#E1972-MJMAIN-28" x="1176" y="0"></use><g transform="translate(1565,0)"><use xlink:href="#E1972-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1972-MJMAIN-29" x="2533" y="0"></use><use xlink:href="#E1972-MJMAIN-5B" x="2922" y="0"></use><use xlink:href="#E1972-MJMAIN-2207" x="3200" y="0"></use><g transform="translate(4199,0)"><use xlink:href="#E1972-MJMAIN-6C"></use><use xlink:href="#E1972-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1972-MJMATHI-3C0" x="5200" y="0"></use><use xlink:href="#E1972-MJMAIN-28" x="5773" y="0"></use><g transform="translate(6162,0)"><use xlink:href="#E1972-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1972-MJMAIN-2223" x="7545" y="0"></use><g transform="translate(8101,0)"><use xlink:href="#E1972-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1972-MJMAIN-3B" x="9069" y="0"></use><use xlink:href="#E1972-MJMATHI-3B8" x="9514" y="0"></use><use xlink:href="#E1972-MJMAIN-29" x="9983" y="0"></use><g transform="translate(10372,0)"><use xlink:href="#E1972-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMAIN-32" x="393" y="513"></use></g><use xlink:href="#E1972-MJSZ1-5D" x="11103" y="-1"></use></g><use xlink:href="#E1972-MJMAIN-3D" x="12729" y="0"></use><use xlink:href="#E1972-MJMATHI-45" x="13785" y="0"></use><use xlink:href="#E1972-MJMAIN-5B" x="14549" y="0"></use><use xlink:href="#E1972-MJMATHI-42" x="14827" y="0"></use><use xlink:href="#E1972-MJMAIN-28" x="15586" y="0"></use><g transform="translate(15975,0)"><use xlink:href="#E1972-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1972-MJMAIN-29" x="16943" y="0"></use><use xlink:href="#E1972-MJMAIN-5D" x="17332" y="0"></use><use xlink:href="#E1972-MJMATHI-45" x="17610" y="0"></use><g transform="translate(18541,0)"><use xlink:href="#E1972-MJSZ1-5B"></use><use xlink:href="#E1972-MJMAIN-5B" x="417" y="0"></use><use xlink:href="#E1972-MJMAIN-2207" x="695" y="0"></use><g transform="translate(1694,0)"><use xlink:href="#E1972-MJMAIN-6C"></use><use xlink:href="#E1972-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1972-MJMATHI-3C0" x="2695" y="0"></use><use xlink:href="#E1972-MJMAIN-28" x="3268" y="0"></use><g transform="translate(3657,0)"><use xlink:href="#E1972-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1972-MJMAIN-2223" x="5040" y="0"></use><g transform="translate(5596,0)"><use xlink:href="#E1972-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1972-MJMAIN-3B" x="6564" y="0"></use><use xlink:href="#E1972-MJMATHI-3B8" x="7009" y="0"></use><use xlink:href="#E1972-MJMAIN-29" x="7478" y="0"></use><g transform="translate(7867,0)"><use xlink:href="#E1972-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1972-MJMAIN-32" x="393" y="513"></use></g><use xlink:href="#E1972-MJSZ1-5D" x="8598" y="-1"></use></g></g></svg></span><script type="math/tex">E\left[B(S_t)[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] = E[B(S_t)]E\left[[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right]</script><span> ，即两者相互独立，并令上述偏导为 0 ，则有：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n42" cid="n42" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-830-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="72.529ex" height="21.038ex" viewBox="0 -4778.2 31227.7 9058.1" role="img" focusable="false" style="vertical-align: -9.94ex; max-width: 100%;"><defs><path stroke-width="0" id="E1932-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1932-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1932-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1932-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1932-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1932-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1932-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1932-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1932-MJMATHI-42" d="M231 637Q204 637 199 638T194 649Q194 676 205 682Q206 683 335 683Q594 683 608 681Q671 671 713 636T756 544Q756 480 698 429T565 360L555 357Q619 348 660 311T702 219Q702 146 630 78T453 1Q446 0 242 0Q42 0 39 2Q35 5 35 10Q35 17 37 24Q42 43 47 45Q51 46 62 46H68Q95 46 128 49Q142 52 147 61Q150 65 219 339T288 628Q288 635 231 637ZM649 544Q649 574 634 600T585 634Q578 636 493 637Q473 637 451 637T416 636H403Q388 635 384 626Q382 622 352 506Q352 503 351 500L320 374H401Q482 374 494 376Q554 386 601 434T649 544ZM595 229Q595 273 572 302T512 336Q506 337 429 337Q311 337 310 336Q310 334 293 263T258 122L240 52Q240 48 252 48T333 46Q422 46 429 47Q491 54 543 105T595 229Z"></path><path stroke-width="0" id="E1932-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1932-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1932-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1932-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1932-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1932-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1932-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1932-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1932-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1932-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1932-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1932-MJSZ1-5B" d="M202 -349V850H394V810H242V-309H394V-349H202Z"></path><path stroke-width="0" id="E1932-MJSZ1-5D" d="M22 810V850H214V-349H22V-309H174V810H22Z"></path><path stroke-width="0" id="E1932-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1932-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,3842)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><use xlink:href="#E1932-MJMAIN-2212" x="417" y="0"></use><use xlink:href="#E1932-MJMAIN-32" x="1195" y="0"></use><g transform="translate(1695,0)"><use xlink:href="#E1932-MJMATHI-3B3" x="0" y="0"></use><g transform="translate(550,412)"><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="500" y="0"></use></g></g><use xlink:href="#E1932-MJMAIN-28" x="2954" y="0"></use><g transform="translate(3343,0)"><use xlink:href="#E1932-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2212" x="4706" y="0"></use><use xlink:href="#E1932-MJMATHI-42" x="5707" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="6466" y="0"></use><g transform="translate(6855,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-29" x="7823" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="8212" y="0"></use><use xlink:href="#E1932-MJMAIN-5B" x="8601" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="8879" y="0"></use><g transform="translate(9879,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="10879" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="11452" y="0"></use><g transform="translate(11841,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="13224" y="0"></use><g transform="translate(13780,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="14748" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="15193" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="15662" y="0"></use><g transform="translate(16051,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="16782" y="-1"></use></g></g><g transform="translate(2537,2309)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><use xlink:href="#E1932-MJMAIN-28" x="417" y="0"></use><g transform="translate(806,0)"><use xlink:href="#E1932-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2212" x="2169" y="0"></use><use xlink:href="#E1932-MJMATHI-42" x="3169" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="3928" y="0"></use><g transform="translate(4317,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-29" x="5285" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="5674" y="0"></use><use xlink:href="#E1932-MJMAIN-5B" x="6063" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="6341" y="0"></use><g transform="translate(7341,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="8342" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="8915" y="0"></use><g transform="translate(9304,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="10687" y="0"></use><g transform="translate(11243,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="12211" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="12656" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="13125" y="0"></use><g transform="translate(13514,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="14245" y="-1"></use></g></g><g transform="translate(5679,776)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><use xlink:href="#E1932-MJMATHI-42" x="417" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="1176" y="0"></use><g transform="translate(1565,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-29" x="2533" y="0"></use><use xlink:href="#E1932-MJMAIN-5B" x="2922" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="3200" y="0"></use><g transform="translate(4199,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="5200" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="5773" y="0"></use><g transform="translate(6162,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="7545" y="0"></use><g transform="translate(8101,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="9069" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="9514" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="9983" y="0"></use><g transform="translate(10372,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="11103" y="-1"></use></g></g><g transform="translate(4359,-758)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1932-MJMAIN-5B" x="764" y="0"></use><use xlink:href="#E1932-MJMATHI-42" x="1042" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="1801" y="0"></use><g transform="translate(2190,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-29" x="3158" y="0"></use><use xlink:href="#E1932-MJMAIN-5D" x="3547" y="0"></use><use xlink:href="#E1932-MJMATHI-45" x="3825" y="0"></use><g transform="translate(4755,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><use xlink:href="#E1932-MJMAIN-5B" x="417" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="695" y="0"></use><g transform="translate(1694,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="2695" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="3268" y="0"></use><g transform="translate(3657,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="5040" y="0"></use><g transform="translate(5596,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="6564" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="7009" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="7478" y="0"></use><g transform="translate(7867,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="8598" y="-1"></use></g></g><g transform="translate(14305,-3050)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1932-MJMAIN-5B" x="764" y="0"></use><use xlink:href="#E1932-MJMATHI-42" x="1042" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="1801" y="0"></use><g transform="translate(2190,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-29" x="3158" y="0"></use><use xlink:href="#E1932-MJMAIN-5D" x="3547" y="0"></use></g></g><g transform="translate(18112,0)"><g transform="translate(0,3842)"><use xlink:href="#E1932-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1932-MJMAIN-30" x="1333" y="0"></use></g><g transform="translate(0,2309)"><use xlink:href="#E1932-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1932-MJMAIN-30" x="1333" y="0"></use></g><g transform="translate(0,776)"><use xlink:href="#E1932-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1932-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1932-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-5B" x="1558" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="1836" y="0"></use><g transform="translate(2835,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="3836" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="4409" y="0"></use><g transform="translate(4798,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="6181" y="0"></use><g transform="translate(6737,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="7705" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="8150" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="8619" y="0"></use><g transform="translate(9008,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="9739" y="-1"></use></g></g><g transform="translate(0,-758)"><use xlink:href="#E1932-MJMAIN-3D" x="277" y="0"></use><use xlink:href="#E1932-MJMATHI-45" x="1333" y="0"></use><g transform="translate(2264,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1932-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-5B" x="1558" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="1836" y="0"></use><g transform="translate(2835,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="3836" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="4409" y="0"></use><g transform="translate(4798,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="6181" y="0"></use><g transform="translate(6737,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="7705" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="8150" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="8619" y="0"></use><g transform="translate(9008,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="9739" y="-1"></use></g></g><g transform="translate(0,-3050)"><use xlink:href="#E1932-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1055,0)"><g transform="translate(397,0)"><rect stroke="none" width="11207" height="60" x="0" y="220"></rect><g transform="translate(60,793)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><g transform="translate(417,0)"><use xlink:href="#E1932-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-5B" x="1558" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="1836" y="0"></use><g transform="translate(2835,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="3836" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="4409" y="0"></use><g transform="translate(4798,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="6181" y="0"></use><g transform="translate(6737,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="7705" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="8150" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="8619" y="0"></use><g transform="translate(9008,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="513"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="9739" y="-1"></use></g></g><g transform="translate(630,-828)"><use xlink:href="#E1932-MJMATHI-45" x="0" y="0"></use><g transform="translate(930,0)"><use xlink:href="#E1932-MJSZ1-5B"></use><use xlink:href="#E1932-MJMAIN-5B" x="417" y="0"></use><use xlink:href="#E1932-MJMAIN-2207" x="695" y="0"></use><g transform="translate(1694,0)"><use xlink:href="#E1932-MJMAIN-6C"></use><use xlink:href="#E1932-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1932-MJMATHI-3C0" x="2695" y="0"></use><use xlink:href="#E1932-MJMAIN-28" x="3268" y="0"></use><g transform="translate(3657,0)"><use xlink:href="#E1932-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-2223" x="5040" y="0"></use><g transform="translate(5596,0)"><use xlink:href="#E1932-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1932-MJMAIN-3B" x="6564" y="0"></use><use xlink:href="#E1932-MJMATHI-3B8" x="7009" y="0"></use><use xlink:href="#E1932-MJMAIN-29" x="7478" y="0"></use><g transform="translate(7867,0)"><use xlink:href="#E1932-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1932-MJMAIN-32" x="393" y="583"></use></g><use xlink:href="#E1932-MJSZ1-5D" x="8598" y="-1"></use></g></g></g></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-830">\begin{split}
E\left[-2\gamma^{2t}(G_t-B(S_t))[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] &= 0 \\
E\left[(G_t-B(S_t))[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] &= 0 \\
E\left[B(S_t)[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] &= E\left[G_t[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] \\
E[B(S_t)]E\left[[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] &= E\left[G_t[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right] \\
E[B(S_t)] &= \frac{E\left[G_t[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right]}{\displaystyle E\left[[\nabla\ln\pi(A_t \mid S_t;\theta)]^2\right]} \\
\end{split}</script></div></div><p><span>这意味着，最佳的基线函数应当接近回报 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.651ex" height="2.228ex" viewBox="0 -749.6 1141.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E1982-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1982-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1982-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1982-MJMATHI-74" x="1111" y="-213"></use></g></svg></span><script type="math/tex">G_t</script><span> 以 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="19.003ex" height="2.903ex" viewBox="0 -915.7 8181.6 1250" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1974-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1974-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1974-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1974-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1974-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1974-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1974-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1974-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1974-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1974-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1974-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1974-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1974-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1974-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1974-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1974-MJMAIN-5B" x="0" y="0"></use><use xlink:href="#E1974-MJMAIN-2207" x="278" y="0"></use><g transform="translate(1277,0)"><use xlink:href="#E1974-MJMAIN-6C"></use><use xlink:href="#E1974-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1974-MJMATHI-3C0" x="2278" y="0"></use><use xlink:href="#E1974-MJMAIN-28" x="2851" y="0"></use><g transform="translate(3240,0)"><use xlink:href="#E1974-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1974-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1974-MJMAIN-2223" x="4623" y="0"></use><g transform="translate(5179,0)"><use xlink:href="#E1974-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1974-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1974-MJMAIN-3B" x="6147" y="0"></use><use xlink:href="#E1974-MJMATHI-3B8" x="6592" y="0"></use><use xlink:href="#E1974-MJMAIN-29" x="7061" y="0"></use><g transform="translate(7450,0)"><use xlink:href="#E1974-MJMAIN-5D" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1974-MJMAIN-32" x="393" y="513"></use></g></g></svg></span><script type="math/tex">[\nabla\ln\pi(A_t \mid S_t;\theta)]^2</script><span> 为权重加权平均的结果，但是实际应用中，无法事先知道这个值，所以无法使用这样的基线函数。值得一提的是，当策略参数和价值参数同时需要学习的时候，算法的收敛性需要通过双时间轴 Robbins-Monro 算法（two timescale Robbins-Monro algorithm）来分析。</span></p><h3><a name="三异策回合更新策略梯度算法" class="md-header-anchor"></a><span>三、异策回合更新策略梯度算法</span></h3><p><span>在简单的策略梯度算法的基础上引入重要性采样，即可得到对应的异策算法。记行为策略为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="7.057ex" height="2.71ex" viewBox="0 -832.7 3038.6 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1975-MJMATHI-62" d="M73 647Q73 657 77 670T89 683Q90 683 161 688T234 694Q246 694 246 685T212 542Q204 508 195 472T180 418L176 399Q176 396 182 402Q231 442 283 442Q345 442 383 396T422 280Q422 169 343 79T173 -11Q123 -11 82 27T40 150V159Q40 180 48 217T97 414Q147 611 147 623T109 637Q104 637 101 637H96Q86 637 83 637T76 640T73 647ZM336 325V331Q336 405 275 405Q258 405 240 397T207 376T181 352T163 330L157 322L136 236Q114 150 114 114Q114 66 138 42Q154 26 178 26Q211 26 245 58Q270 81 285 114T318 219Q336 291 336 325Z"></path><path stroke-width="0" id="E1975-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1975-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1975-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1975-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1975-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1975-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1975-MJMAIN-28" x="429" y="0"></use><use xlink:href="#E1975-MJMATHI-61" x="818" y="0"></use><use xlink:href="#E1975-MJMAIN-2223" x="1624" y="0"></use><use xlink:href="#E1975-MJMATHI-73" x="2180" y="0"></use><use xlink:href="#E1975-MJMAIN-29" x="2649" y="0"></use></g></svg></span><script type="math/tex">b(a \mid s)</script><span> ，有：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n46" cid="n46" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display" style="text-align: center;"><span class="MathJax_SVG" id="MathJax-Element-831-Frame" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="71.533ex" height="24.704ex" viewBox="0 -5567.4 30798.7 10636.3" role="img" focusable="false" style="vertical-align: -11.621ex; margin-bottom: -0.153ex; max-width: 100%;"><defs><path stroke-width="0" id="E1933-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1933-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1933-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1933-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1933-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1933-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1933-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1933-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1933-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1933-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E1933-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1933-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1933-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1933-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1933-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1933-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1933-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path><path stroke-width="0" id="E1933-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1933-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E1933-MJMATHI-61" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path stroke-width="0" id="E1933-MJMATHI-73" d="M131 289Q131 321 147 354T203 415T300 442Q362 442 390 415T419 355Q419 323 402 308T364 292Q351 292 340 300T328 326Q328 342 337 354T354 372T367 378Q368 378 368 379Q368 382 361 388T336 399T297 405Q249 405 227 379T204 326Q204 301 223 291T278 274T330 259Q396 230 396 163Q396 135 385 107T352 51T289 7T195 -10Q118 -10 86 19T53 87Q53 126 74 143T118 160Q133 160 146 151T160 120Q160 94 142 76T111 58Q109 57 108 57T107 55Q108 52 115 47T146 34T201 27Q237 27 263 38T301 66T318 97T323 122Q323 150 302 164T254 181T195 196T148 231Q131 256 131 289Z"></path><path stroke-width="0" id="E1933-MJMATHI-62" d="M73 647Q73 657 77 670T89 683Q90 683 161 688T234 694Q246 694 246 685T212 542Q204 508 195 472T180 418L176 399Q176 396 182 402Q231 442 283 442Q345 442 383 396T422 280Q422 169 343 79T173 -11Q123 -11 82 27T40 150V159Q40 180 48 217T97 414Q147 611 147 623T109 637Q104 637 101 637H96Q86 637 83 637T76 640T73 647ZM336 325V331Q336 405 275 405Q258 405 240 397T207 376T181 352T163 330L157 322L136 236Q114 150 114 114Q114 66 138 42Q154 26 178 26Q211 26 245 58Q270 81 285 114T318 219Q336 291 336 325Z"></path><path stroke-width="0" id="E1933-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1933-MJSZ3-5B" d="M247 -949V1450H516V1388H309V-887H516V-949H247Z"></path><path stroke-width="0" id="E1933-MJSZ3-5D" d="M11 1388V1450H280V-949H11V-887H218V1388H11Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(167,0)"><g transform="translate(-19,0)"><g transform="translate(0,4553)"><use xlink:href="#E1933-MJMATHI-45" x="0" y="0"></use><g transform="translate(738,-186)"><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-3C0" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMAIN-28" x="573" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-3B8" x="962" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMAIN-29" x="1431" y="0"></use></g><use xlink:href="#E1933-MJMAIN-5B" x="2124" y="0"></use><g transform="translate(2402,0)"><use xlink:href="#E1933-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(3308,0)"><use xlink:href="#E1933-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2207" x="4450" y="0"></use><g transform="translate(5449,0)"><use xlink:href="#E1933-MJMAIN-6C"></use><use xlink:href="#E1933-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1933-MJMATHI-3C0" x="6450" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="7023" y="0"></use><g transform="translate(7412,0)"><use xlink:href="#E1933-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2223" x="8795" y="0"></use><g transform="translate(9351,0)"><use xlink:href="#E1933-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-3B" x="10319" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="10764" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="11233" y="0"></use><use xlink:href="#E1933-MJMAIN-5D" x="11622" y="0"></use></g></g><g transform="translate(11882,0)"><g transform="translate(0,4553)"><use xlink:href="#E1933-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1933-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1933-MJMATHI-3C0" x="2944" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="3517" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="3906" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="4713" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="5268" y="0"></use><use xlink:href="#E1933-MJMAIN-3B" x="5737" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="6182" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="6651" y="0"></use><g transform="translate(7040,0)"><use xlink:href="#E1933-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(7946,0)"><use xlink:href="#E1933-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2207" x="9087" y="0"></use><g transform="translate(10087,0)"><use xlink:href="#E1933-MJMAIN-6C"></use><use xlink:href="#E1933-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1933-MJMATHI-3C0" x="11087" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="11660" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="12049" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="12856" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="13412" y="0"></use><use xlink:href="#E1933-MJMAIN-3B" x="13881" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="14326" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="14795" y="0"></use></g><g transform="translate(0,1652)"><use xlink:href="#E1933-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1933-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1933-MJMATHI-62" x="2944" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="3373" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="3762" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="4569" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="5124" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="5593" y="0"></use><g transform="translate(5982,0)"><g transform="translate(120,0)"><rect stroke="none" width="4216" height="60" x="0" y="220"></rect><g transform="translate(60,693)"><use xlink:href="#E1933-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="962" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="1768" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="2324" y="0"></use><use xlink:href="#E1933-MJMAIN-3B" x="2793" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="3238" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="3707" y="0"></use></g><g transform="translate(588,-694)"><use xlink:href="#E1933-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="429" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="818" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="1624" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="2180" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="2649" y="0"></use></g></g></g><g transform="translate(10439,0)"><use xlink:href="#E1933-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(11344,0)"><use xlink:href="#E1933-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2207" x="12486" y="0"></use><g transform="translate(13485,0)"><use xlink:href="#E1933-MJMAIN-6C"></use><use xlink:href="#E1933-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1933-MJMATHI-3C0" x="14486" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="15059" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="15448" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="16255" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="16810" y="0"></use><use xlink:href="#E1933-MJMAIN-3B" x="17279" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="17724" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="18193" y="0"></use></g><g transform="translate(0,-1148)"><use xlink:href="#E1933-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1933-MJSZ2-2211" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-61" x="756" y="-1485"></use></g><use xlink:href="#E1933-MJMATHI-62" x="2944" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="3373" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="3762" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="4569" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="5124" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="5593" y="0"></use><g transform="translate(5982,0)"><g transform="translate(120,0)"><rect stroke="none" width="3158" height="60" x="0" y="220"></rect><use xlink:href="#E1933-MJMAIN-31" x="1329" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1933-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="429" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="818" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="1624" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="2180" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="2649" y="0"></use></g></g></g><g transform="translate(9381,0)"><use xlink:href="#E1933-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(10287,0)"><use xlink:href="#E1933-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2207" x="11428" y="0"></use><use xlink:href="#E1933-MJMATHI-3C0" x="12261" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="12834" y="0"></use><use xlink:href="#E1933-MJMATHI-61" x="13223" y="0"></use><use xlink:href="#E1933-MJMAIN-2223" x="14030" y="0"></use><use xlink:href="#E1933-MJMATHI-73" x="14585" y="0"></use><use xlink:href="#E1933-MJMAIN-3B" x="15054" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="15499" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="15968" y="0"></use></g><g transform="translate(0,-4054)"><use xlink:href="#E1933-MJMAIN-3D" x="277" y="0"></use><g transform="translate(1333,0)"><use xlink:href="#E1933-MJMATHI-45" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-62" x="1043" y="-213"></use></g><g transform="translate(2641,0)"><use xlink:href="#E1933-MJSZ3-5B"></use><g transform="translate(528,0)"><g transform="translate(120,0)"><rect stroke="none" width="4234" height="60" x="0" y="220"></rect><use xlink:href="#E1933-MJMAIN-31" x="1867" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1933-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="429" y="0"></use><g transform="translate(818,0)"><use xlink:href="#E1933-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2223" x="2201" y="0"></use><g transform="translate(2756,0)"><use xlink:href="#E1933-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-29" x="3725" y="0"></use></g></g></g><g transform="translate(5002,0)"><use xlink:href="#E1933-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(5907,0)"><use xlink:href="#E1933-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2207" x="7049" y="0"></use><use xlink:href="#E1933-MJMATHI-3C0" x="7882" y="0"></use><use xlink:href="#E1933-MJMAIN-28" x="8455" y="0"></use><g transform="translate(8844,0)"><use xlink:href="#E1933-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-2223" x="10227" y="0"></use><g transform="translate(10782,0)"><use xlink:href="#E1933-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1933-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1933-MJMAIN-3B" x="11751" y="0"></use><use xlink:href="#E1933-MJMATHI-3B8" x="12195" y="0"></use><use xlink:href="#E1933-MJMAIN-29" x="12664" y="0"></use><use xlink:href="#E1933-MJSZ3-5D" x="13053" y="-1"></use></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-831">\begin{split}
E_{\pi(\theta)}[\gamma^tG_t \nabla\ln\pi(A_t \mid S_t;\theta)] &= \sum_a \pi(a \mid s;\theta)\gamma^tG_t \nabla\ln\pi(a \mid s;\theta) \\
&= \sum_a b(a \mid s) \frac{\pi(a \mid s;\theta)}{b(a \mid s)} \gamma^tG_t \nabla\ln\pi(a \mid s;\theta) \\
&= \sum_a b(a \mid s) \frac{1}{b(a \mid s)} \gamma^tG_t \nabla\pi(a \mid s;\theta) \\
&= E_{b}\left[ \frac{1}{b(A_t \mid S_t)}\gamma^tG_t \nabla\pi(A_t \mid S_t;\theta) \right]
\end{split}</script></div></div><p><span>所以采用重要性采样的离线算法，只需要修改更新式 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.968ex" height="2.71ex" viewBox="0 -832.7 1278 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2008-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2008-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="0" id="E2008-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><a class="mjx-svg-href" xlink:href="#mjx-eqn-eq%3A3"><rect width="1278" height="1000" y="-250" fill="none" stroke="none" pointer-events="all"></rect><g class="MathJax_ref"><use xlink:href="#E2008-MJMAIN-28"></use><use xlink:href="#E2008-MJMAIN-33" x="389" y="0"></use><use xlink:href="#E2008-MJMAIN-29" x="889" y="0"></use></g></a></g></svg></span><script type="math/tex">\eqref{eq:3}</script><span> 中期望回报的梯度表达式即可，得到的具体算法如下：</span></p><div contenteditable="false" spellcheck="false" class="mathjax-block md-end-block md-math-block md-rawblock" id="mathjax-n48" cid="n48" mdtype="math_block"><div class="md-rawblock-container md-math-container" tabindex="-1"><div class="MathJax_SVG_Display"><span class="MathJax_SVG" id="MathJax-Element-832-Frame" tabindex="-1" style="font-size: 100%; display: inline-block; zoom: 0.96231;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="102.244ex" height="32.324ex" viewBox="-18.1 -43.5 44021.5 13917.4" role="img" focusable="false" style="vertical-align: -32.223ex; margin-left: -0.042ex; max-width: 100%;"><defs><path stroke-width="0" id="E1934-MJMAINB-37" d="M256 -11Q231 -11 208 5T185 65Q185 105 193 146T212 220T241 289T275 349T312 402T346 445T377 479T397 502L400 504H301Q156 503 150 497Q142 491 134 456T126 407H64V411Q65 414 82 544T99 675T130 676H161V673Q161 669 162 666T167 661T173 657T181 654T190 652T200 651T210 650T220 649T229 648Q237 648 254 647T276 646Q277 646 426 644H558V620V607Q558 596 551 586T509 537Q489 515 476 500Q390 401 384 393Q349 339 337 259T324 113T322 38Q307 -11 256 -11Z"></path><path stroke-width="0" id="E1934-MJMAINB-2D" d="M13 166V278H318V166H13Z"></path><path stroke-width="0" id="E1934-MJMAINB-33" d="M80 503Q80 565 133 610T274 655Q366 655 421 623T491 538Q493 528 493 510Q493 446 453 407T361 348L376 344Q452 324 489 281T526 184Q526 152 514 121T474 58T392 8T265 -11Q175 -11 111 34T48 152Q50 187 72 209T132 232Q171 232 193 208T216 147Q216 136 214 126T207 108T197 94T187 84T178 77T170 72L168 71Q168 70 179 65T215 54T266 48H270Q331 48 350 105Q358 128 358 185Q358 239 348 268T309 313Q292 321 242 322Q205 322 198 324T191 341V348Q191 366 196 369T232 375Q239 375 247 376T260 377T268 378Q284 383 297 393T326 436T341 517Q341 536 339 547T331 573T308 593T266 600Q248 600 241 599Q214 593 183 576Q234 556 234 503Q234 462 210 444T157 426Q126 426 103 446T80 503Z"></path><path stroke-width="0" id="E1934-MJMAIN-22EF" d="M78 250Q78 274 95 292T138 310Q162 310 180 294T199 251Q199 226 182 208T139 190T96 207T78 250ZM525 250Q525 274 542 292T585 310Q609 310 627 294T646 251Q646 226 629 208T586 190T543 207T525 250ZM972 250Q972 274 989 292T1032 310Q1056 310 1074 294T1093 251Q1093 226 1076 208T1033 190T990 207T972 250Z"></path><path stroke-width="0" id="E1934-MJMAIN-37" d="M55 458Q56 460 72 567L88 674Q88 676 108 676H128V672Q128 662 143 655T195 646T364 644H485V605L417 512Q408 500 387 472T360 435T339 403T319 367T305 330T292 284T284 230T278 162T275 80Q275 66 275 52T274 28V19Q270 2 255 -10T221 -22Q210 -22 200 -19T179 0T168 40Q168 198 265 368Q285 400 349 489L395 552H302Q128 552 119 546Q113 543 108 522T98 479L95 458V455H55V458Z"></path><path stroke-width="0" id="E1934-MJMAIN-2D" d="M11 179V252H277V179H11Z"></path><path stroke-width="0" id="E1934-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E1934-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path stroke-width="0" id="E1934-MJMAIN-2E" d="M78 60Q78 84 95 102T138 120Q162 120 180 104T199 61Q199 36 182 18T139 0T96 17T78 60Z"></path><path stroke-width="0" id="E1934-MJMATHI-62" d="M73 647Q73 657 77 670T89 683Q90 683 161 688T234 694Q246 694 246 685T212 542Q204 508 195 472T180 418L176 399Q176 396 182 402Q231 442 283 442Q345 442 383 396T422 280Q422 169 343 79T173 -11Q123 -11 82 27T40 150V159Q40 180 48 217T97 414Q147 611 147 623T109 637Q104 637 101 637H96Q86 637 83 637T76 640T73 647ZM336 325V331Q336 405 275 405Q258 405 240 397T207 376T181 352T163 330L157 322L136 236Q114 150 114 114Q114 66 138 42Q154 26 178 26Q211 26 245 58Q270 81 285 114T318 219Q336 291 336 325Z"></path><path stroke-width="0" id="E1934-MJMAIN-226B" d="M55 539T55 547T60 561T74 567Q81 567 207 498Q297 449 365 412Q633 265 636 261Q639 255 639 250Q639 241 626 232Q614 224 365 88Q83 -65 79 -66Q76 -67 73 -67Q65 -67 60 -61T55 -47Q55 -39 61 -33Q62 -33 95 -15T193 39T320 109L321 110H322L323 111H324L325 112L326 113H327L329 114H330L331 115H332L333 116L334 117H335L336 118H337L338 119H339L340 120L341 121H342L343 122H344L345 123H346L347 124L348 125H349L351 126H352L353 127H354L355 128L356 129H357L358 130H359L360 131H361L362 132L363 133H364L365 134H366L367 135H368L369 136H370L371 137L372 138H373L374 139H375L376 140L378 141L576 251Q63 530 62 533Q55 539 55 547ZM360 539T360 547T365 561T379 567Q386 567 512 498Q602 449 670 412Q938 265 941 261Q944 255 944 250Q944 241 931 232Q919 224 670 88Q388 -65 384 -66Q381 -67 378 -67Q370 -67 365 -61T360 -47Q360 -39 366 -33Q367 -33 400 -15T498 39T625 109L626 110H627L628 111H629L630 112L631 113H632L634 114H635L636 115H637L638 116L639 117H640L641 118H642L643 119H644L645 120L646 121H647L648 122H649L650 123H651L652 124L653 125H654L656 126H657L658 127H659L660 128L661 129H662L663 130H664L665 131H666L667 132L668 133H669L670 134H671L672 135H673L674 136H675L676 137L677 138H678L679 139H680L681 140L683 141L881 251Q368 530 367 533Q360 539 360 547Z"></path><path stroke-width="0" id="E1934-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1934-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1934-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1934-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1934-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1934-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E1934-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E1934-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1934-MJMATHI-52" d="M230 637Q203 637 198 638T193 649Q193 676 204 682Q206 683 378 683Q550 682 564 680Q620 672 658 652T712 606T733 563T739 529Q739 484 710 445T643 385T576 351T538 338L545 333Q612 295 612 223Q612 212 607 162T602 80V71Q602 53 603 43T614 25T640 16Q668 16 686 38T712 85Q717 99 720 102T735 105Q755 105 755 93Q755 75 731 36Q693 -21 641 -21H632Q571 -21 531 4T487 82Q487 109 502 166T517 239Q517 290 474 313Q459 320 449 321T378 323H309L277 193Q244 61 244 59Q244 55 245 54T252 50T269 48T302 46H333Q339 38 339 37T336 19Q332 6 326 0H311Q275 2 180 2Q146 2 117 2T71 2T50 1Q33 1 33 10Q33 12 36 24Q41 43 46 45Q50 46 61 46H67Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628Q287 635 230 637ZM630 554Q630 586 609 608T523 636Q521 636 500 636T462 637H440Q393 637 386 627Q385 624 352 494T319 361Q319 360 388 360Q466 361 492 367Q556 377 592 426Q608 449 619 486T630 554Z"></path><path stroke-width="0" id="E1934-MJMATHI-54" d="M40 437Q21 437 21 445Q21 450 37 501T71 602L88 651Q93 669 101 677H569H659Q691 677 697 676T704 667Q704 661 687 553T668 444Q668 437 649 437Q640 437 637 437T631 442L629 445Q629 451 635 490T641 551Q641 586 628 604T573 629Q568 630 515 631Q469 631 457 630T439 622Q438 621 368 343T298 60Q298 48 386 46Q418 46 427 45T436 36Q436 31 433 22Q429 4 424 1L422 0Q419 0 415 0Q410 0 363 1T228 2Q99 2 64 0H49Q43 6 43 9T45 27Q49 40 55 46H83H94Q174 46 189 55Q190 56 191 56Q196 59 201 76T241 233Q258 301 269 344Q339 619 339 625Q339 630 310 630H279Q212 630 191 624Q146 614 121 583T67 467Q60 445 57 441T43 437H40Z"></path><path stroke-width="0" id="E1934-MJMAIN-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path stroke-width="0" id="E1934-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="0" id="E1934-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1934-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1934-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E1934-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1934-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1934-MJMAIN-2190" d="M944 261T944 250T929 230H165Q167 228 182 216T211 189T244 152T277 96T303 25Q308 7 308 0Q308 -11 288 -11Q281 -11 278 -11T272 -7T267 2T263 21Q245 94 195 151T73 236Q58 242 55 247Q55 254 59 257T73 264Q121 283 158 314T215 375T247 434T264 480L267 497Q269 503 270 505T275 509T288 511Q308 511 308 500Q308 493 303 475Q293 438 278 406T246 352T215 315T185 287T165 270H929Q944 261 944 250Z"></path><path stroke-width="0" id="E1934-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E1934-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E1934-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><g transform="translate(9757,-2462)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">算</text><g transform="translate(1052,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">法</text></g><use transform="scale(1.2)" xlink:href="#E1934-MJMAINB-37" x="1963" y="0"></use><use transform="scale(1.2)" xlink:href="#E1934-MJMAINB-2D" x="2538" y="0"></use><use transform="scale(1.2)" xlink:href="#E1934-MJMAINB-33" x="2921" y="0"></use><g transform="translate(4945,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">重</text></g><g transform="translate(5998,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">要</text></g><g transform="translate(7050,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">性</text></g><g transform="translate(8103,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">采</text></g><g transform="translate(9156,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">样</text></g><g transform="translate(10209,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">简</text></g><g transform="translate(11225,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">单</text></g><g transform="translate(12278,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(13331,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(14384,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">梯</text></g><g transform="translate(15437,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">度</text></g><g transform="translate(16490,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">求</text></g><g transform="translate(17542,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">解</text></g><g transform="translate(18595,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">最</text></g><g transform="translate(19648,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">优</text></g><g transform="translate(20701,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(21754,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" font-weight="bold" stroke="none" transform="scale(49.839) matrix(1 0 0 -1 0 0)">略</text></g></g><g transform="translate(0,-7877)"><g transform="translate(-19,0)"><g transform="translate(0,4018)"><g><rect fill="black" stroke="none" width="1569" height="100" x="0" y="500"></rect></g></g><g transform="translate(0,-3819)"><g><rect fill="black" stroke="none" width="1569" height="100" x="0" y="-500"></rect></g></g></g><g transform="translate(1551,0)"><g transform="translate(0,4018)"><g><rect fill="black" stroke="none" width="41597" height="100" x="0" y="500"></rect></g></g><g transform="translate(0,2718)"><use xlink:href="#E1934-MJMAIN-22EF" x="166" y="0"></use><g transform="translate(2505,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">同</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">算</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">法</text></g><use xlink:href="#E1934-MJMAIN-37" x="2741" y="0"></use><use xlink:href="#E1934-MJMAIN-2D" x="3241" y="0"></use><use xlink:href="#E1934-MJMAIN-31" x="3574" y="0"></use></g><use xlink:href="#E1934-MJMAIN-22EF" x="7746" y="0"></use></g><g transform="translate(0,1418)"><use xlink:href="#E1934-MJMAIN-32"></use><use xlink:href="#E1934-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1934-MJMAIN-31" x="778" y="0"></use><g transform="translate(1278,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">（</text></g><g transform="translate(2108,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">采</text></g><g transform="translate(2939,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">样</text></g><g transform="translate(3769,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">）</text></g><g transform="translate(4600,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">指</text></g><g transform="translate(5431,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">定</text></g><g transform="translate(6261,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">行</text></g><g transform="translate(7092,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">为</text></g><g transform="translate(7923,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">策</text></g><g transform="translate(8753,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">略</text></g><g transform="translate(9834,0)"><use xlink:href="#E1934-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1934-MJMAIN-226B" x="706" y="0"></use><use xlink:href="#E1934-MJMATHI-3C0" x="1984" y="0"></use><use xlink:href="#E1934-MJMAIN-28" x="2557" y="0"></use><use xlink:href="#E1934-MJMATHI-3B8" x="2946" y="0"></use><use xlink:href="#E1934-MJMAIN-29" x="3415" y="0"></use></g><g transform="translate(13639,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">并</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">用</text></g><g transform="translate(2741,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">其</text></g><g transform="translate(3572,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">生</text></g><g transform="translate(4403,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">成</text></g><g transform="translate(5233,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">轨</text></g><g transform="translate(6064,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">迹</text></g></g><g transform="translate(20784,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-30" x="866" y="-213"></use><use xlink:href="#E1934-MJMAIN-2C" x="1066" y="0"></use><g transform="translate(1511,0)"><use xlink:href="#E1934-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-30" x="1060" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2C" x="2714" y="0"></use><g transform="translate(3159,0)"><use xlink:href="#E1934-MJMATHI-52" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-31" x="1073" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2C" x="4371" y="0"></use><g transform="translate(4816,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-31" x="866" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2C" x="5883" y="0"></use><use xlink:href="#E1934-MJMAIN-22EF" x="6327" y="0"></use><use xlink:href="#E1934-MJMAIN-2C" x="7666" y="0"></use><g transform="translate(8111,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><g transform="translate(613,-150)"><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-54" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-2212" x="704" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-31" x="1482" y="0"></use></g></g><use xlink:href="#E1934-MJMAIN-2C" x="10225" y="0"></use><g transform="translate(10670,0)"><use xlink:href="#E1934-MJMATHI-41" x="0" y="0"></use><g transform="translate(750,-150)"><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-54" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-2212" x="704" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMAIN-31" x="1482" y="0"></use></g></g><use xlink:href="#E1934-MJMAIN-2C" x="12921" y="0"></use><g transform="translate(13366,0)"><use xlink:href="#E1934-MJMATHI-52" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-54" x="1073" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2C" x="14723" y="0"></use><g transform="translate(15167,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-54" x="866" y="-213"></use></g></g><g transform="translate(37163,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g><g transform="translate(0,68)"><use xlink:href="#E1934-MJMAIN-22EF" x="166" y="0"></use><g transform="translate(2505,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">同</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">算</text></g><g transform="translate(1661,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">法</text></g><use xlink:href="#E1934-MJMAIN-37" x="2741" y="0"></use><use xlink:href="#E1934-MJMAIN-2D" x="3241" y="0"></use><use xlink:href="#E1934-MJMAIN-31" x="3574" y="0"></use></g><use xlink:href="#E1934-MJMAIN-22EF" x="7746" y="0"></use></g><g transform="translate(0,-1775)"><g transform="translate(2000,0)"><use xlink:href="#E1934-MJMAIN-32"></use><use xlink:href="#E1934-MJMAIN-2E" x="500" y="0"></use><use xlink:href="#E1934-MJMAIN-33" x="778" y="0"></use><use xlink:href="#E1934-MJMAIN-2E" x="1278" y="0"></use><use xlink:href="#E1934-MJMAIN-32" x="1556" y="0"></use><g transform="translate(2750,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">更</text><g transform="translate(830,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">新</text></g></g><use xlink:href="#E1934-MJMATHI-3B8" x="4661" y="0"></use><g transform="translate(5130,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">以</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">减</text></g><g transform="translate(1911,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">小</text></g></g><g transform="translate(8122,0)"><use xlink:href="#E1934-MJMAIN-2212" x="0" y="0"></use><g transform="translate(778,0)"><g transform="translate(120,0)"><rect stroke="none" width="4234" height="60" x="0" y="220"></rect><use xlink:href="#E1934-MJMAIN-31" x="1867" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1934-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1934-MJMAIN-28" x="429" y="0"></use><g transform="translate(818,0)"><use xlink:href="#E1934-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2223" x="2201" y="0"></use><g transform="translate(2756,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-29" x="3725" y="0"></use></g></g></g><g transform="translate(5252,0)"><use xlink:href="#E1934-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1934-MJMATHI-47" x="6157" y="0"></use><use xlink:href="#E1934-MJMATHI-3C0" x="6943" y="0"></use><use xlink:href="#E1934-MJMAIN-28" x="7516" y="0"></use><g transform="translate(7905,0)"><use xlink:href="#E1934-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2223" x="9288" y="0"></use><g transform="translate(9844,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-3B" x="10812" y="0"></use><use xlink:href="#E1934-MJMATHI-3B8" x="11257" y="0"></use><use xlink:href="#E1934-MJMAIN-29" x="11726" y="0"></use></g><g transform="translate(20238,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">，</text></g><g transform="translate(1080,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">如</text></g></g><g transform="translate(22399,0)"><use xlink:href="#E1934-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E1934-MJMAIN-2190" x="746" y="0"></use><use xlink:href="#E1934-MJMATHI-3B8" x="2024" y="0"></use><use xlink:href="#E1934-MJMAIN-2B" x="2715" y="0"></use><use xlink:href="#E1934-MJMATHI-3B1" x="3716" y="0"></use><g transform="translate(4356,0)"><g transform="translate(120,0)"><rect stroke="none" width="4234" height="60" x="0" y="220"></rect><use xlink:href="#E1934-MJMAIN-31" x="1867" y="676"></use><g transform="translate(60,-694)"><use xlink:href="#E1934-MJMATHI-62" x="0" y="0"></use><use xlink:href="#E1934-MJMAIN-28" x="429" y="0"></use><g transform="translate(818,0)"><use xlink:href="#E1934-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2223" x="2201" y="0"></use><g transform="translate(2756,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-29" x="3725" y="0"></use></g></g></g><g transform="translate(8830,0)"><use xlink:href="#E1934-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="778" y="583"></use></g><use xlink:href="#E1934-MJMATHI-47" x="9735" y="0"></use><use xlink:href="#E1934-MJMAIN-2207" x="10521" y="0"></use><use xlink:href="#E1934-MJMATHI-3C0" x="11354" y="0"></use><use xlink:href="#E1934-MJMAIN-28" x="11927" y="0"></use><g transform="translate(12316,0)"><use xlink:href="#E1934-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-2223" x="13699" y="0"></use><g transform="translate(14255,0)"><use xlink:href="#E1934-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1934-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1934-MJMAIN-3B" x="15223" y="0"></use><use xlink:href="#E1934-MJMATHI-3B8" x="15668" y="0"></use><use xlink:href="#E1934-MJMAIN-29" x="16137" y="0"></use></g><g transform="translate(38926,0)"><g transform="translate(250,0)"><text font-family="STIXGeneral, 'PingFang SC', serif" stroke="none" transform="scale(41.533) matrix(1 0 0 -1 0 0)">。</text></g></g></g></g><g transform="translate(0,-3819)"><g><rect fill="black" stroke="none" width="41597" height="100" x="0" y="-500"></rect></g></g></g></g></g></svg></span></div><script type="math/tex; mode=display" id="MathJax-Element-832">\; \\ \; \\
\large \textbf{算法 7-3   重要性采样简单策略梯度求解最优策略} \\
\begin{split}
\rule[5pt]{10mm}{0.1em} &\rule[5pt]{265mm}{0.1em} \\
&\cdots \quad \text{同算法 7-1} \quad \cdots \\
&\text{2.1（采样）指定行为策略 $b \gg \pi(\theta)$ ，并用其生成轨迹 $S_0,A_0,R_1,S_1,\cdots,S_{T-1},A_{T-1},R_T,S_T$ 。} \\
&\cdots \quad \text{同算法 7-1} \quad \cdots \\
&\qquad \text{2.3.2 $\;\,$更新 $\theta$ 以减小 $-\frac{1}{b(A_t \mid S_t)} \gamma^t G \pi(A_t \mid S_t; \theta)$ ，如 $\theta \leftarrow \theta + \alpha\frac{1}{b(A_t \mid S_t)} \gamma^t G \nabla \pi(A_t \mid S_t; \theta)$ 。} \\
\rule[-5pt]{10mm}{0.1em} &\rule[-5pt]{265mm}{0.1em}
\end{split}
\; \\ \; \\</script></div></div><p><span>重要性采样虽然使得可利用其他策略的样本来更新策略参数，但可能会带来较大的偏差，算法稳定性比同策算法差。</span></p><h3><a name="四策略梯度更新与极大似然估计的关系" class="md-header-anchor"></a><span>四、策略梯度更新与极大似然估计的关系</span></h3><p><span>以上算法都是通过更新策略参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 以试图增大形如 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="20.421ex" height="2.71ex" viewBox="0 -832.7 8792.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1992-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1992-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1992-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E1992-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1992-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1992-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1992-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1992-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1992-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1992-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1992-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1992-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1992-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1992-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1992-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1992-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1992-MJMAIN-5B" x="764" y="0"></use><g transform="translate(1042,0)"><use xlink:href="#E1992-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1992-MJMATHI-74" x="1100" y="-213"></use></g><g transform="translate(2341,0)"><use xlink:href="#E1992-MJMAIN-6C"></use><use xlink:href="#E1992-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1992-MJMATHI-3C0" x="3342" y="0"></use><use xlink:href="#E1992-MJMAIN-28" x="3915" y="0"></use><g transform="translate(4304,0)"><use xlink:href="#E1992-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1992-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1992-MJMAIN-2223" x="5687" y="0"></use><g transform="translate(6243,0)"><use xlink:href="#E1992-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1992-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1992-MJMAIN-3B" x="7211" y="0"></use><use xlink:href="#E1992-MJMATHI-3B8" x="7656" y="0"></use><use xlink:href="#E1992-MJMAIN-29" x="8125" y="0"></use><use xlink:href="#E1992-MJMAIN-5D" x="8514" y="0"></use></g></svg></span><script type="math/tex">E[\Psi_t\ln\pi(A_t \mid S_t;\theta)]</script><span> 的目标（单个条目则为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="17.355ex" height="2.71ex" viewBox="0 -832.7 7472.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1979-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E1979-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1979-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1979-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1979-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1979-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1979-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1979-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1979-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1979-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1979-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1979-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1979-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1979-MJMATHI-74" x="1100" y="-213"></use><g transform="translate(1299,0)"><use xlink:href="#E1979-MJMAIN-6C"></use><use xlink:href="#E1979-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1979-MJMATHI-3C0" x="2300" y="0"></use><use xlink:href="#E1979-MJMAIN-28" x="2873" y="0"></use><g transform="translate(3262,0)"><use xlink:href="#E1979-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1979-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1979-MJMAIN-2223" x="4645" y="0"></use><g transform="translate(5201,0)"><use xlink:href="#E1979-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1979-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1979-MJMAIN-3B" x="6169" y="0"></use><use xlink:href="#E1979-MJMATHI-3B8" x="6614" y="0"></use><use xlink:href="#E1979-MJMAIN-29" x="7083" y="0"></use></g></svg></span><script type="math/tex">\Psi_t\ln\pi(A_t \mid S_t;\theta)</script><span> ），其中 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> 可取 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.879ex" height="2.228ex" viewBox="0 -749.6 1239.6 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E1981-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1981-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1981-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1981-MJMAIN-30" x="1111" y="-213"></use></g></svg></span><script type="math/tex">G_0</script><span> 、</span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.651ex" height="2.228ex" viewBox="0 -749.6 1141.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E1982-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E1982-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1982-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1982-MJMATHI-74" x="1111" y="-213"></use></g></svg></span><script type="math/tex">G_t</script><span> 等值。从监督学习的角度来看，如果已经有一个表达式未知的策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.331ex" height="1.36ex" viewBox="0 -500.4 573 585.5" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E1986-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1986-MJMATHI-3C0" x="0" y="0"></use></g></svg></span><script type="math/tex">\pi</script><span> ，当要用策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="4.227ex" height="2.71ex" viewBox="0 -832.7 1820 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1996-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1996-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1996-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1996-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1996-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1996-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1996-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1996-MJMAIN-29" x="1431" y="0"></use></g></svg></span><script type="math/tex">\pi(\theta)</script><span> 来近似它时，可以考虑用最大似然的方法来估计策略参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 。具体而言，未知策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.331ex" height="1.36ex" viewBox="0 -500.4 573 585.5" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E1986-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1986-MJMATHI-3C0" x="0" y="0"></use></g></svg></span><script type="math/tex">\pi</script><span> 的许多样本对于策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="4.227ex" height="2.71ex" viewBox="0 -832.7 1820 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1996-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1996-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1996-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1996-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1996-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1996-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1996-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1996-MJMAIN-29" x="1431" y="0"></use></g></svg></span><script type="math/tex">\pi(\theta)</script><span> 的对数似然值正比于 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="17.917ex" height="2.71ex" viewBox="0 -832.7 7714.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1990-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1990-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1990-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1990-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1990-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1990-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1990-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1990-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1990-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1990-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1990-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1990-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1990-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1990-MJMAIN-28" x="764" y="0"></use><g transform="translate(1153,0)"><use xlink:href="#E1990-MJMAIN-6C"></use><use xlink:href="#E1990-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1990-MJMATHI-3C0" x="2153" y="0"></use><use xlink:href="#E1990-MJMAIN-28" x="2726" y="0"></use><g transform="translate(3115,0)"><use xlink:href="#E1990-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1990-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1990-MJMAIN-2223" x="4498" y="0"></use><g transform="translate(5054,0)"><use xlink:href="#E1990-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1990-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1990-MJMAIN-3B" x="6022" y="0"></use><use xlink:href="#E1990-MJMATHI-3B8" x="6467" y="0"></use><use xlink:href="#E1990-MJMAIN-29" x="6936" y="0"></use><use xlink:href="#E1990-MJMAIN-29" x="7325" y="0"></use></g></svg></span><script type="math/tex">E(\ln\pi(A_t \mid S_t;\theta))</script><span> ，这时使用这些样本进行有监督学习，则是更新 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 以增大 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="17.917ex" height="2.71ex" viewBox="0 -832.7 7714.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1990-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1990-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1990-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1990-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1990-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1990-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1990-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1990-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1990-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1990-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1990-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1990-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1990-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1990-MJMAIN-28" x="764" y="0"></use><g transform="translate(1153,0)"><use xlink:href="#E1990-MJMAIN-6C"></use><use xlink:href="#E1990-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1990-MJMATHI-3C0" x="2153" y="0"></use><use xlink:href="#E1990-MJMAIN-28" x="2726" y="0"></use><g transform="translate(3115,0)"><use xlink:href="#E1990-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1990-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1990-MJMAIN-2223" x="4498" y="0"></use><g transform="translate(5054,0)"><use xlink:href="#E1990-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1990-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1990-MJMAIN-3B" x="6022" y="0"></use><use xlink:href="#E1990-MJMATHI-3B8" x="6467" y="0"></use><use xlink:href="#E1990-MJMAIN-29" x="6936" y="0"></use><use xlink:href="#E1990-MJMAIN-29" x="7325" y="0"></use></g></svg></span><script type="math/tex">E(\ln\pi(A_t \mid S_t;\theta))</script><span>（单个条目则为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="14.336ex" height="2.71ex" viewBox="0 -832.7 6172.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1994-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1994-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1994-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1994-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1994-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1994-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1994-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1994-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1994-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1994-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1994-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1994-MJMAIN-6C"></use><use xlink:href="#E1994-MJMAIN-6E" x="278" y="0"></use><use xlink:href="#E1994-MJMATHI-3C0" x="1000" y="0"></use><use xlink:href="#E1994-MJMAIN-28" x="1573" y="0"></use><g transform="translate(1962,0)"><use xlink:href="#E1994-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1994-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1994-MJMAIN-2223" x="3345" y="0"></use><g transform="translate(3901,0)"><use xlink:href="#E1994-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1994-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1994-MJMAIN-3B" x="4869" y="0"></use><use xlink:href="#E1994-MJMATHI-3B8" x="5314" y="0"></use><use xlink:href="#E1994-MJMAIN-29" x="5783" y="0"></use></g></svg></span><script type="math/tex">\ln\pi(A_t \mid S_t;\theta)</script><span> ），可以看出这里是目标  </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="20.421ex" height="2.71ex" viewBox="0 -832.7 8792.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1992-MJMATHI-45" d="M492 213Q472 213 472 226Q472 230 477 250T482 285Q482 316 461 323T364 330H312Q311 328 277 192T243 52Q243 48 254 48T334 46Q428 46 458 48T518 61Q567 77 599 117T670 248Q680 270 683 272Q690 274 698 274Q718 274 718 261Q613 7 608 2Q605 0 322 0H133Q31 0 31 11Q31 13 34 25Q38 41 42 43T65 46Q92 46 125 49Q139 52 144 61Q146 66 215 342T285 622Q285 629 281 629Q273 632 228 634H197Q191 640 191 642T193 659Q197 676 203 680H757Q764 676 764 669Q764 664 751 557T737 447Q735 440 717 440H705Q698 445 698 453L701 476Q704 500 704 528Q704 558 697 578T678 609T643 625T596 632T532 634H485Q397 633 392 631Q388 629 386 622Q385 619 355 499T324 377Q347 376 372 376H398Q464 376 489 391T534 472Q538 488 540 490T557 493Q562 493 565 493T570 492T572 491T574 487T577 483L544 351Q511 218 508 216Q505 213 492 213Z"></path><path stroke-width="0" id="E1992-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E1992-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E1992-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1992-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1992-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1992-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1992-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1992-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1992-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1992-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1992-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1992-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1992-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path stroke-width="0" id="E1992-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1992-MJMATHI-45" x="0" y="0"></use><use xlink:href="#E1992-MJMAIN-5B" x="764" y="0"></use><g transform="translate(1042,0)"><use xlink:href="#E1992-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1992-MJMATHI-74" x="1100" y="-213"></use></g><g transform="translate(2341,0)"><use xlink:href="#E1992-MJMAIN-6C"></use><use xlink:href="#E1992-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E1992-MJMATHI-3C0" x="3342" y="0"></use><use xlink:href="#E1992-MJMAIN-28" x="3915" y="0"></use><g transform="translate(4304,0)"><use xlink:href="#E1992-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1992-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1992-MJMAIN-2223" x="5687" y="0"></use><g transform="translate(6243,0)"><use xlink:href="#E1992-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1992-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1992-MJMAIN-3B" x="7211" y="0"></use><use xlink:href="#E1992-MJMATHI-3B8" x="7656" y="0"></use><use xlink:href="#E1992-MJMAIN-29" x="8125" y="0"></use><use xlink:href="#E1992-MJMAIN-5D" x="8514" y="0"></use></g></svg></span><script type="math/tex">E[\Psi_t\ln\pi(A_t \mid S_t;\theta)]</script><span> 中取 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="6.891ex" height="2.228ex" viewBox="0 -749.6 2966.8 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E1993-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E1993-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1993-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E1993-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1993-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1993-MJMATHI-74" x="1100" y="-213"></use><use xlink:href="#E1993-MJMAIN-3D" x="1411" y="0"></use><use xlink:href="#E1993-MJMAIN-31" x="2466" y="0"></use></g></svg></span><script type="math/tex">\Psi_t=1</script><span> 时得到的，在形式上具有想相似性。事实上，策略梯度算法在学习过程中巧妙地利用观测到的奖励信号决定每步对数似然值 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="14.336ex" height="2.71ex" viewBox="0 -832.7 6172.4 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1994-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E1994-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E1994-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1994-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1994-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E1994-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E1994-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E1994-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E1994-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E1994-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1994-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1994-MJMAIN-6C"></use><use xlink:href="#E1994-MJMAIN-6E" x="278" y="0"></use><use xlink:href="#E1994-MJMATHI-3C0" x="1000" y="0"></use><use xlink:href="#E1994-MJMAIN-28" x="1573" y="0"></use><g transform="translate(1962,0)"><use xlink:href="#E1994-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1994-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E1994-MJMAIN-2223" x="3345" y="0"></use><g transform="translate(3901,0)"><use xlink:href="#E1994-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E1994-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E1994-MJMAIN-3B" x="4869" y="0"></use><use xlink:href="#E1994-MJMATHI-3B8" x="5314" y="0"></use><use xlink:href="#E1994-MJMAIN-29" x="5783" y="0"></use></g></svg></span><script type="math/tex">\ln\pi(A_t \mid S_t;\theta)</script><span> 对策略奖励的贡献，为其加权为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> ，使得表现好的行为策略更新幅度大，更加倾向于出现；表现很差的行为策略更新幅度很小，更加倾向于不出现；最终使得整个策略 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="4.227ex" height="2.71ex" viewBox="0 -832.7 1820 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E1996-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E1996-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E1996-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E1996-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E1996-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E1996-MJMAIN-28" x="573" y="0"></use><use xlink:href="#E1996-MJMATHI-3B8" x="962" y="0"></use><use xlink:href="#E1996-MJMAIN-29" x="1431" y="0"></use></g></svg></span><script type="math/tex">\pi(\theta)</script><span> 变得越来越好。</span></p><h3><a name="五案例车杆平衡cartpole-v0）" class="md-header-anchor"></a><span>五、案例：车杆平衡（CartPole-v0）</span></h3><p><span>使用 Gym 库里的车杆平衡问题（CartPole-v0）作为案例分析。该问题的环境为一个小车（cart）上连着一根杆（pole），目的是控制小车左右移动，使得杆保持直立；环境的观测值、动作值、奖励值、起始状态、回合结束标志在</span><a href='https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py' target='_blank' title='CartPole-v1'><span>源代码</span></a><span>中有描述，此处不再赘述。</span></p><p><span>在使用书中的同策策略梯度算法求解最优策略的代码时，经多次测试发现该代码的收敛性较差，训练智能体过程的回合奖励值变化大多呈下降趋势，在尝试调节学习率后，最好的情况也是呈现波动趋势，且测试结果的平均奖励值不超过 50 。经调试，造成该结果的可能原因是书中代码直接将 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="14.644ex" height="2.71ex" viewBox="0 -832.7 6305 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2001-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2001-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2001-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2001-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2001-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2001-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2001-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2001-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2001-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2001-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2001-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="1100" y="-213"></use><use xlink:href="#E2001-MJMATHI-3C0" x="1133" y="0"></use><use xlink:href="#E2001-MJMAIN-28" x="1706" y="0"></use><g transform="translate(2095,0)"><use xlink:href="#E2001-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2001-MJMAIN-2223" x="3478" y="0"></use><g transform="translate(4034,0)"><use xlink:href="#E2001-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2001-MJMAIN-3B" x="5002" y="0"></use><use xlink:href="#E2001-MJMATHI-3B8" x="5447" y="0"></use><use xlink:href="#E2001-MJMAIN-29" x="5916" y="0"></use></g></svg></span><script type="math/tex">\Psi_t\pi(A_t \mid S_t;\theta)</script><span> 作为输出结果进行训练，而实际上网络输出层的激活函数是 </span><code>softmax</code><span> ，输出的应该是 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="12.012ex" height="2.71ex" viewBox="0 -832.7 5171.8 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2005-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2005-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2005-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2005-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2005-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2005-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2005-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2005-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2005-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2005-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E2005-MJMAIN-28" x="573" y="0"></use><g transform="translate(962,0)"><use xlink:href="#E2005-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2005-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2005-MJMAIN-2223" x="2345" y="0"></use><g transform="translate(2900,0)"><use xlink:href="#E2005-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2005-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2005-MJMAIN-3B" x="3869" y="0"></use><use xlink:href="#E2005-MJMATHI-3B8" x="4313" y="0"></use><use xlink:href="#E2005-MJMAIN-29" x="4782" y="0"></use></g></svg></span><script type="math/tex">\pi(A_t \mid S_t;\theta)</script><span> 。</span></p><p><span>具体而言就是，假设以 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="14.644ex" height="2.71ex" viewBox="0 -832.7 6305 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2001-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2001-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2001-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2001-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2001-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2001-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2001-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2001-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2001-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2001-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2001-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="1100" y="-213"></use><use xlink:href="#E2001-MJMATHI-3C0" x="1133" y="0"></use><use xlink:href="#E2001-MJMAIN-28" x="1706" y="0"></use><g transform="translate(2095,0)"><use xlink:href="#E2001-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2001-MJMAIN-2223" x="3478" y="0"></use><g transform="translate(4034,0)"><use xlink:href="#E2001-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2001-MJMAIN-3B" x="5002" y="0"></use><use xlink:href="#E2001-MJMATHI-3B8" x="5447" y="0"></use><use xlink:href="#E2001-MJMAIN-29" x="5916" y="0"></use></g></svg></span><script type="math/tex">\Psi_t\pi(A_t \mid S_t;\theta)</script><span> 作为输出结果来训练，由于训练网络的损失函数为交叉熵函数，网络各个输出结果的值域为 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="4.647ex" height="2.71ex" viewBox="0 -832.7 2000.7 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2000-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path><path stroke-width="0" id="E2000-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E2000-MJMAIN-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path stroke-width="0" id="E2000-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path stroke-width="0" id="E2000-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2000-MJMAIN-5B" x="0" y="0"></use><use xlink:href="#E2000-MJMAIN-30" x="278" y="0"></use><use xlink:href="#E2000-MJMAIN-2C" x="778" y="0"></use><use xlink:href="#E2000-MJMAIN-31" x="1222" y="0"></use><use xlink:href="#E2000-MJMAIN-5D" x="1722" y="0"></use></g></svg></span><script type="math/tex">[0,1]</script><span> ，为了使损失函数降低，训练会使网络的输出结果趋向于 0 或 1 ，即与环境交互得到新样本的 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="14.644ex" height="2.71ex" viewBox="0 -832.7 6305 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2001-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2001-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2001-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2001-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2001-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2001-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2001-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2001-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2001-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2001-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2001-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="1100" y="-213"></use><use xlink:href="#E2001-MJMATHI-3C0" x="1133" y="0"></use><use xlink:href="#E2001-MJMAIN-28" x="1706" y="0"></use><g transform="translate(2095,0)"><use xlink:href="#E2001-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2001-MJMAIN-2223" x="3478" y="0"></use><g transform="translate(4034,0)"><use xlink:href="#E2001-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2001-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2001-MJMAIN-3B" x="5002" y="0"></use><use xlink:href="#E2001-MJMATHI-3B8" x="5447" y="0"></use><use xlink:href="#E2001-MJMAIN-29" x="5916" y="0"></use></g></svg></span><script type="math/tex">\Psi_t\pi(A_t \mid S_t;\theta)</script><span> 值趋向于 0 或 1 ，这样损失函数才会越来越小；但实际上在该环境中得到的新样本的 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> 值基本上都是大于 1 的，那么以上的训练过程就只会导致 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> 越来越小，且尽可能的接近 1 ，最终导致算法无法收敛或甚至发散。修改前的代码如下：</span></p><pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="python"><div class="CodeMirror cm-s-inner CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; height: 0px; top: 0px; left: 33px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"></textarea></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"></div><div class="CodeMirror-gutter-filler" cm-not-content="true"></div><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 29px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre><div class="CodeMirror-linenumber CodeMirror-gutter-elt"><div>2</div></div></div><div class="CodeMirror-measure"></div><div style="position: relative; z-index: 1;"></div><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"></div><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: -29px; width: 29px;"></div><div class="CodeMirror-gutter-wrapper CodeMirror-activeline-gutter" style="left: -29px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 20px;">1</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">y</span> = <span class="cm-variable">np</span>.<span class="cm-property">eye</span>(<span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span>)[<span class="cm-variable">df</span>[<span class="cm-string">"action"</span>]] <span class="cm-operator">*</span> <span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>].<span class="cm-property">values</span>[:, <span class="cm-variable">np</span>.<span class="cm-property">newaxis</span>]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -29px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 20px;">2</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable-2">self</span>.<span class="cm-property">policy_net</span>.<span class="cm-property">fit</span>(<span class="cm-variable">x</span>, <span class="cm-variable">y</span>, <span class="cm-variable">verbose</span>=<span class="cm-number">0</span>)</span></pre></div></div></div></div></div></div><div style="position: absolute; height: 0px; width: 1px; border-bottom: 0px solid transparent; top: 64px;"></div><div class="CodeMirror-gutters" style="height: 64px;"><div class="CodeMirror-gutter CodeMirror-linenumbers" style="width: 28px;"></div></div></div></div></pre><p><span>解决办法是在训练时，将 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> 作为样本权重，</span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="12.012ex" height="2.71ex" viewBox="0 -832.7 5171.8 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2005-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2005-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2005-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2005-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2005-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2005-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2005-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2005-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2005-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2005-MJMATHI-3C0" x="0" y="0"></use><use xlink:href="#E2005-MJMAIN-28" x="573" y="0"></use><g transform="translate(962,0)"><use xlink:href="#E2005-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2005-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2005-MJMAIN-2223" x="2345" y="0"></use><g transform="translate(2900,0)"><use xlink:href="#E2005-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2005-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2005-MJMAIN-3B" x="3869" y="0"></use><use xlink:href="#E2005-MJMATHI-3B8" x="4313" y="0"></use><use xlink:href="#E2005-MJMAIN-29" x="4782" y="0"></use></g></svg></span><script type="math/tex">\pi(A_t \mid S_t;\theta)</script><span> 作为输出结果；即 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> 大的样本，其权重也大，更新幅度越大，这样才符合算法的思想。另外，在每个回合后直接使用整个回合的样本作为 </span><code>batch</code><span> 进行训练，其效果有时候会非常好，对应函数为 </span><code>train_on_batch</code><span> ，可能是因为这样的更新方式是直接符合更新式 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="35.658ex" height="6.955ex" viewBox="0 -1746.4 15352.5 2994.3" role="img" focusable="false" style="vertical-align: -2.899ex;"><defs><path stroke-width="0" id="E2007-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path><path stroke-width="0" id="E2007-MJMAIN-2190" d="M944 261T944 250T929 230H165Q167 228 182 216T211 189T244 152T277 96T303 25Q308 7 308 0Q308 -11 288 -11Q281 -11 278 -11T272 -7T267 2T263 21Q245 94 195 151T73 236Q58 242 55 247Q55 254 59 257T73 264Q121 283 158 314T215 375T247 434T264 480L267 497Q269 503 270 505T275 509T288 511Q308 511 308 500Q308 493 303 475Q293 438 278 406T246 352T215 315T185 287T165 270H929Q944 261 944 250Z"></path><path stroke-width="0" id="E2007-MJMAIN-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path stroke-width="0" id="E2007-MJMATHI-3B1" d="M34 156Q34 270 120 356T309 442Q379 442 421 402T478 304Q484 275 485 237V208Q534 282 560 374Q564 388 566 390T582 393Q603 393 603 385Q603 376 594 346T558 261T497 161L486 147L487 123Q489 67 495 47T514 26Q528 28 540 37T557 60Q559 67 562 68T577 70Q597 70 597 62Q597 56 591 43Q579 19 556 5T512 -10H505Q438 -10 414 62L411 69L400 61Q390 53 370 41T325 18T267 -2T203 -11Q124 -11 79 39T34 156ZM208 26Q257 26 306 47T379 90L403 112Q401 255 396 290Q382 405 304 405Q235 405 183 332Q156 292 139 224T121 120Q121 71 146 49T208 26Z"></path><path stroke-width="0" id="E2007-MJSZ2-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path stroke-width="0" id="E2007-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path><path stroke-width="0" id="E2007-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path stroke-width="0" id="E2007-MJMAIN-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path stroke-width="0" id="E2007-MJMAIN-221E" d="M55 217Q55 305 111 373T254 442Q342 442 419 381Q457 350 493 303L507 284L514 294Q618 442 747 442Q833 442 888 374T944 214Q944 128 889 59T743 -11Q657 -11 580 50Q542 81 506 128L492 147L485 137Q381 -11 252 -11Q166 -11 111 57T55 217ZM907 217Q907 285 869 341T761 397Q740 397 720 392T682 378T648 359T619 335T594 310T574 285T559 263T548 246L543 238L574 198Q605 158 622 138T664 94T714 61T765 51Q827 51 867 100T907 217ZM92 214Q92 145 131 89T239 33Q357 33 456 193L425 233Q364 312 334 337Q285 380 233 380Q171 380 132 331T92 214Z"></path><path stroke-width="0" id="E2007-MJMATHI-3B3" d="M31 249Q11 249 11 258Q11 275 26 304T66 365T129 418T206 441Q233 441 239 440Q287 429 318 386T371 255Q385 195 385 170Q385 166 386 166L398 193Q418 244 443 300T486 391T508 430Q510 431 524 431H537Q543 425 543 422Q543 418 522 378T463 251T391 71Q385 55 378 6T357 -100Q341 -165 330 -190T303 -216Q286 -216 286 -188Q286 -138 340 32L346 51L347 69Q348 79 348 100Q348 257 291 317Q251 355 196 355Q148 355 108 329T51 260Q49 251 47 251Q45 249 31 249Z"></path><path stroke-width="0" id="E2007-MJMATHI-47" d="M50 252Q50 367 117 473T286 641T490 704Q580 704 633 653Q642 643 648 636T656 626L657 623Q660 623 684 649Q691 655 699 663T715 679T725 690L740 705H746Q760 705 760 698Q760 694 728 561Q692 422 692 421Q690 416 687 415T669 413H653Q647 419 647 422Q647 423 648 429T650 449T651 481Q651 552 619 605T510 659Q492 659 471 656T418 643T357 615T294 567T236 496T189 394T158 260Q156 242 156 221Q156 173 170 136T206 79T256 45T308 28T353 24Q407 24 452 47T514 106Q517 114 529 161T541 214Q541 222 528 224T468 227H431Q425 233 425 235T427 254Q431 267 437 273H454Q494 271 594 271Q634 271 659 271T695 272T707 272Q721 272 721 263Q721 261 719 249Q714 230 709 228Q706 227 694 227Q674 227 653 224Q646 221 643 215T629 164Q620 131 614 108Q589 6 586 3Q584 1 581 1Q571 1 553 21T530 52Q530 53 528 52T522 47Q448 -22 322 -22Q201 -22 126 55T50 252Z"></path><path stroke-width="0" id="E2007-MJMAIN-2207" d="M46 676Q46 679 51 683H781Q786 679 786 676Q786 674 617 326T444 -26Q439 -33 416 -33T388 -26Q385 -22 216 326T46 676ZM697 596Q697 597 445 597T193 596Q195 591 319 336T445 80L697 596Z"></path><path stroke-width="0" id="E2007-MJMAIN-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path stroke-width="0" id="E2007-MJMAIN-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path stroke-width="0" id="E2007-MJMATHI-3C0" d="M132 -11Q98 -11 98 22V33L111 61Q186 219 220 334L228 358H196Q158 358 142 355T103 336Q92 329 81 318T62 297T53 285Q51 284 38 284Q19 284 19 294Q19 300 38 329T93 391T164 429Q171 431 389 431Q549 431 553 430Q573 423 573 402Q573 371 541 360Q535 358 472 358H408L405 341Q393 269 393 222Q393 170 402 129T421 65T431 37Q431 20 417 5T381 -10Q370 -10 363 -7T347 17T331 77Q330 86 330 121Q330 170 339 226T357 318T367 358H269L268 354Q268 351 249 275T206 114T175 17Q164 -11 132 -11Z"></path><path stroke-width="0" id="E2007-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2007-MJMATHI-41" d="M208 74Q208 50 254 46Q272 46 272 35Q272 34 270 22Q267 8 264 4T251 0Q249 0 239 0T205 1T141 2Q70 2 50 0H42Q35 7 35 11Q37 38 48 46H62Q132 49 164 96Q170 102 345 401T523 704Q530 716 547 716H555H572Q578 707 578 706L606 383Q634 60 636 57Q641 46 701 46Q726 46 726 36Q726 34 723 22Q720 7 718 4T704 0Q701 0 690 0T651 1T578 2Q484 2 455 0H443Q437 6 437 9T439 27Q443 40 445 43L449 46H469Q523 49 533 63L521 213H283L249 155Q208 86 208 74ZM516 260Q516 271 504 416T490 562L463 519Q447 492 400 412L310 260L413 259Q516 259 516 260Z"></path><path stroke-width="0" id="E2007-MJMAIN-2223" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path stroke-width="0" id="E2007-MJMATHI-53" d="M308 24Q367 24 416 76T466 197Q466 260 414 284Q308 311 278 321T236 341Q176 383 176 462Q176 523 208 573T273 648Q302 673 343 688T407 704H418H425Q521 704 564 640Q565 640 577 653T603 682T623 704Q624 704 627 704T632 705Q645 705 645 698T617 577T585 459T569 456Q549 456 549 465Q549 471 550 475Q550 478 551 494T553 520Q553 554 544 579T526 616T501 641Q465 662 419 662Q362 662 313 616T263 510Q263 480 278 458T319 427Q323 425 389 408T456 390Q490 379 522 342T554 242Q554 216 546 186Q541 164 528 137T492 78T426 18T332 -20Q320 -22 298 -22Q199 -22 144 33L134 44L106 13Q83 -14 78 -18T65 -22Q52 -22 52 -14Q52 -11 110 221Q112 227 130 227H143Q149 221 149 216Q149 214 148 207T144 186T142 153Q144 114 160 87T203 47T255 29T308 24Z"></path><path stroke-width="0" id="E2007-MJMAIN-3B" d="M78 370Q78 394 95 412T138 430Q162 430 180 414T199 371Q199 346 182 328T139 310T96 327T78 370ZM78 60Q78 85 94 103T137 121Q202 121 202 8Q202 -44 183 -94T144 -169T118 -194Q115 -194 106 -186T95 -174Q94 -171 107 -155T137 -107T160 -38Q161 -32 162 -22T165 -4T165 4Q165 5 161 4T142 0Q110 0 94 18T78 60Z"></path><path stroke-width="0" id="E2007-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2007-MJMATHI-3B8" x="0" y="0"></use><use xlink:href="#E2007-MJMAIN-2190" x="746" y="0"></use><use xlink:href="#E2007-MJMATHI-3B8" x="2024" y="0"></use><use xlink:href="#E2007-MJMAIN-2B" x="2715" y="0"></use><use xlink:href="#E2007-MJMATHI-3B1" x="3716" y="0"></use><g transform="translate(4522,0)"><use xlink:href="#E2007-MJSZ2-2211" x="0" y="0"></use><g transform="translate(142,-1088)"><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-3D" x="361" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-30" x="1139" y="0"></use></g><g transform="translate(93,1150)"><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-2B" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMAIN-221E" x="778" y="0"></use></g></g><g transform="translate(6133,0)"><use xlink:href="#E2007-MJMATHI-3B3" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="778" y="583"></use></g><g transform="translate(7039,0)"><use xlink:href="#E2007-MJMATHI-47" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="1111" y="-213"></use></g><use xlink:href="#E2007-MJMAIN-2207" x="8180" y="0"></use><g transform="translate(9180,0)"><use xlink:href="#E2007-MJMAIN-6C"></use><use xlink:href="#E2007-MJMAIN-6E" x="278" y="0"></use></g><use xlink:href="#E2007-MJMATHI-3C0" x="10180" y="0"></use><use xlink:href="#E2007-MJMAIN-28" x="10753" y="0"></use><g transform="translate(11142,0)"><use xlink:href="#E2007-MJMATHI-41" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="1060" y="-213"></use></g><use xlink:href="#E2007-MJMAIN-2223" x="12525" y="0"></use><g transform="translate(13081,0)"><use xlink:href="#E2007-MJMATHI-53" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2007-MJMATHI-74" x="866" y="-213"></use></g><use xlink:href="#E2007-MJMAIN-3B" x="14049" y="0"></use><use xlink:href="#E2007-MJMATHI-3B8" x="14494" y="0"></use><use xlink:href="#E2007-MJMAIN-29" x="14963" y="0"></use></g></svg></span><script type="math/tex">\displaystyle \theta \leftarrow \theta + \alpha \sum_{t=0}^{+\infty} \gamma^t G_t \nabla \ln \pi(A_t \mid S_t; \theta)</script><span> 的，而不需要通过更新式 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.968ex" height="2.71ex" viewBox="0 -832.7 1278 1166.9" role="img" focusable="false" style="vertical-align: -0.776ex;"><defs><path stroke-width="0" id="E2008-MJMAIN-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path stroke-width="0" id="E2008-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path><path stroke-width="0" id="E2008-MJMAIN-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><a class="mjx-svg-href" xlink:href="#mjx-eqn-eq%3A3"><rect width="1278" height="1000" y="-250" fill="none" stroke="none" pointer-events="all"></rect><g class="MathJax_ref"><use xlink:href="#E2008-MJMAIN-28"></use><use xlink:href="#E2008-MJMAIN-33" x="389" y="0"></use><use xlink:href="#E2008-MJMAIN-29" x="889" y="0"></use></g></a></g></svg></span><script type="math/tex">\eqref{eq:3}</script><span> 来间接更新参数 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.089ex" height="1.939ex" viewBox="0 -749.6 469 834.7" role="img" focusable="false" style="vertical-align: -0.198ex;"><defs><path stroke-width="0" id="E2009-MJMATHI-3B8" d="M35 200Q35 302 74 415T180 610T319 704Q320 704 327 704T339 705Q393 701 423 656Q462 596 462 495Q462 380 417 261T302 66T168 -10H161Q125 -10 99 10T60 63T41 130T35 200ZM383 566Q383 668 330 668Q294 668 260 623T204 521T170 421T157 371Q206 370 254 370L351 371Q352 372 359 404T375 484T383 566ZM113 132Q113 26 166 26Q181 26 198 36T239 74T287 161T335 307L340 324H145Q145 321 136 286T120 208T113 132Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2009-MJMATHI-3B8" x="0" y="0"></use></g></svg></span><script type="math/tex">\theta</script><span> 。修改后的代码如下：</span></p><pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="python"><div class="CodeMirror cm-s-inner CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; height: 0px; top: 0px; left: 33px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"></textarea></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"></div><div class="CodeMirror-gutter-filler" cm-not-content="true"></div><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 29px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre><div class="CodeMirror-linenumber CodeMirror-gutter-elt"><div>3</div></div></div><div class="CodeMirror-measure"></div><div style="position: relative; z-index: 1;"></div><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"></div><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: -29px; width: 29px;"></div><div class="CodeMirror-gutter-wrapper CodeMirror-activeline-gutter" style="left: -29px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 20px;">1</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">sample_weight</span> = <span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>].<span class="cm-property">values</span>[:, <span class="cm-variable">np</span>.<span class="cm-property">newaxis</span>]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -29px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 20px;">2</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">y</span> = <span class="cm-variable">np</span>.<span class="cm-property">eye</span>(<span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span>)[<span class="cm-variable">df</span>[<span class="cm-string">"action"</span>]]</span></pre></div><div style="position: relative;" class=""><div class="CodeMirror-gutter-wrapper" style="left: -29px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 20px;">3</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable-2">self</span>.<span class="cm-property">policy_net</span>.<span class="cm-property">train_on_batch</span>(<span class="cm-variable">x</span>, <span class="cm-variable">y</span>, <span class="cm-variable">sample_weight</span>=<span class="cm-variable">sample_weight</span>)</span></pre></div></div></div></div></div></div><div style="position: absolute; height: 0px; width: 1px; border-bottom: 0px solid transparent; top: 96px;"></div><div class="CodeMirror-gutters" style="height: 96px;"><div class="CodeMirror-gutter CodeMirror-linenumbers" style="width: 28px;"></div></div></div></div></pre><p><span>因为在异策代码中使用的损失函数不同，所以上述不收敛的问题在异策代码中并不存在，但同样也可以使用被修改后的代码进行训练，其思想是一致的。根据书中源代码，将同策异策智能体整合，并对 </span><span class="MathJax_SVG" tabindex="-1" style="font-size: 100%; display: inline-block;"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.632ex" height="2.228ex" viewBox="0 -749.6 1133.3 959.2" role="img" focusable="false" style="vertical-align: -0.487ex;"><defs><path stroke-width="0" id="E2010-MJMAIN-3A8" d="M340 622Q338 623 335 625T331 629T325 631T314 634T298 635T274 636T239 637H212V683H224Q248 680 389 680T554 683H566V637H539Q479 637 464 635T439 622L438 407Q438 192 439 192Q443 193 449 195T474 207T507 232T536 276T557 344Q560 365 562 417T573 493Q587 536 620 544Q627 546 671 546H715L722 540V515Q714 509 708 509Q680 505 671 476T658 392T644 307Q599 177 451 153L438 151V106L439 61Q446 54 451 52T476 48T539 46H566V0H554Q530 3 389 3T224 0H212V46H239Q259 46 273 46T298 47T314 48T325 51T331 54T335 57T340 61V151Q126 178 117 406Q115 503 69 509Q55 509 55 526Q55 541 59 543T86 546H107H120Q150 546 161 543T184 528Q198 514 204 493Q212 472 213 420T226 316T272 230Q287 216 303 207T330 194L339 192Q340 192 340 407V622Z"></path><path stroke-width="0" id="E2010-MJMATHI-74" d="M26 385Q19 392 19 395Q19 399 22 411T27 425Q29 430 36 430T87 431H140L159 511Q162 522 166 540T173 566T179 586T187 603T197 615T211 624T229 626Q247 625 254 615T261 596Q261 589 252 549T232 470L222 433Q222 431 272 431H323Q330 424 330 420Q330 398 317 385H210L174 240Q135 80 135 68Q135 26 162 26Q197 26 230 60T283 144Q285 150 288 151T303 153H307Q322 153 322 145Q322 142 319 133Q314 117 301 95T267 48T216 6T155 -11Q125 -11 98 4T59 56Q57 64 57 83V101L92 241Q127 382 128 383Q128 385 77 385H26Z"></path></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"><use xlink:href="#E2010-MJMAIN-3A8" x="0" y="0"></use><use transform="scale(0.707)" xlink:href="#E2010-MJMATHI-74" x="1100" y="-213"></use></g></svg></span><script type="math/tex">\Psi_t</script><span> 做标准化处理保证网络训练的稳定性，以及将行为策略指定为随机策略，最终修改后的智能体类代码如下：</span></p><pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; height: 0px; top: 0px; left: 43px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"></textarea></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"></div><div class="CodeMirror-gutter-filler" cm-not-content="true"></div><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 39px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre><div class="CodeMirror-linenumber CodeMirror-gutter-elt"><div>54</div></div></div><div class="CodeMirror-measure"></div><div style="position: relative; z-index: 1;"></div><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"></div><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: -39px; width: 39px;"></div><div class="CodeMirror-gutter-wrapper CodeMirror-activeline-gutter" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">1</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">VPG</span>():</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">2</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp;<span class="cm-keyword">def</span> <span class="cm-def">__init__</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">env</span>, <span class="cm-variable">policy_kwargs</span>, <span class="cm-variable">baseline_kwargs</span>=<span class="cm-keyword">None</span>, <span class="cm-variable">gamma</span>=<span class="cm-number">0.99</span>, <span class="cm-variable">offpolicy</span>=<span class="cm-keyword">False</span>):</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">3</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span> = <span class="cm-variable">env</span>.<span class="cm-property">action_space</span>.<span class="cm-property">n</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">4</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">gamma</span>= <span class="cm-variable">gamma</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">5</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">trajectory</span> = []</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">6</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">7</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">if</span> <span class="cm-keyword">not</span> <span class="cm-variable">offpolicy</span>:</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">8</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">random_behavior</span> = <span class="cm-keyword">False</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">9</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">policy_loss</span> = <span class="cm-variable">keras</span>.<span class="cm-property">losses</span>.<span class="cm-property">categorical_crossentropy</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">10</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">else</span>:</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">11</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">random_behavior</span> = <span class="cm-keyword">True</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">12</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">policy_loss</span> = <span class="cm-keyword">lambda</span> <span class="cm-variable">y_true</span>, <span class="cm-variable">y_pred</span>: <span class="cm-operator">-</span><span class="cm-variable">tf</span>.<span class="cm-property">reduce_sum</span>(<span class="cm-variable">y_true</span> <span class="cm-operator">*</span> <span class="cm-variable">y_pred</span>, <span class="cm-variable">axis</span>=<span class="cm-operator">-</span><span class="cm-number">1</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">13</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">14</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">policy_net</span> = <span class="cm-variable-2">self</span>.<span class="cm-property">build_network</span>(<span class="cm-variable">output_size</span>=<span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span>, <span class="cm-error">\</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">15</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">output_activation</span>=<span class="cm-variable">tf</span>.<span class="cm-property">nn</span>.<span class="cm-property">softmax</span>, <span class="cm-variable">loss</span>=<span class="cm-variable">policy_loss</span>, <span class="cm-operator">**</span><span class="cm-variable">policy_kwargs</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">16</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">if</span> <span class="cm-variable">baseline_kwargs</span>:</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">17</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">baseline_net</span> = <span class="cm-variable-2">self</span>.<span class="cm-property">build_network</span>(<span class="cm-variable">output_size</span>=<span class="cm-number">1</span>, <span class="cm-operator">**</span><span class="cm-variable">baseline_kwargs</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">18</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">19</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp;<span class="cm-keyword">def</span> <span class="cm-def">build_network</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">hidden_sizes</span>, <span class="cm-variable">output_size</span>, <span class="cm-variable">activation</span>=<span class="cm-variable">tf</span>.<span class="cm-property">nn</span>.<span class="cm-property">relu</span>, <span class="cm-error">\</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">20</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">output_activation</span>=<span class="cm-keyword">None</span>, <span class="cm-variable">loss</span>=<span class="cm-variable">keras</span>.<span class="cm-property">losses</span>.<span class="cm-property">mse</span>, <span class="cm-variable">learning_rate</span>=<span class="cm-number">0.01</span>):</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">21</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">model</span> = <span class="cm-variable">keras</span>.<span class="cm-property">Sequential</span>()</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">22</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">for</span> <span class="cm-variable">hidden_size</span> <span class="cm-keyword">in</span> <span class="cm-variable">hidden_sizes</span>:</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">23</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">model</span>.<span class="cm-property">add</span>(<span class="cm-variable">keras</span>.<span class="cm-property">layers</span>.<span class="cm-property">Dense</span>(<span class="cm-variable">units</span>=<span class="cm-variable">hidden_size</span>, <span class="cm-variable">activation</span>=<span class="cm-variable">activation</span>))</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">24</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">model</span>.<span class="cm-property">add</span>(<span class="cm-variable">keras</span>.<span class="cm-property">layers</span>.<span class="cm-property">Dense</span>(<span class="cm-variable">units</span>=<span class="cm-variable">output_size</span>, <span class="cm-variable">activation</span>=<span class="cm-variable">output_activation</span>))</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">25</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">model</span>.<span class="cm-property">compile</span>(<span class="cm-variable">loss</span>=<span class="cm-variable">loss</span>, <span class="cm-variable">optimizer</span>=<span class="cm-variable">keras</span>.<span class="cm-property">optimizers</span>.<span class="cm-property">Adam</span>(<span class="cm-variable">lr</span>=<span class="cm-variable">learning_rate</span>))</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">26</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">return</span> <span class="cm-variable">model</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">27</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">28</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp;<span class="cm-keyword">def</span> <span class="cm-def">choose_action</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">state</span>):</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">29</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">probs</span> = <span class="cm-variable-2">self</span>.<span class="cm-property">policy_net</span>.<span class="cm-property">predict</span>(<span class="cm-variable">state</span>[<span class="cm-variable">np</span>.<span class="cm-property">newaxis</span>])[<span class="cm-number">0</span>]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">30</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">return</span> <span class="cm-variable">np</span>.<span class="cm-property">random</span>.<span class="cm-property">choice</span>(<span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span>, <span class="cm-variable">p</span>=<span class="cm-variable">probs</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">31</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">32</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp;<span class="cm-keyword">def</span> <span class="cm-def">learn</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">state</span>, <span class="cm-variable">action</span>, <span class="cm-variable">reward</span>, <span class="cm-variable">done</span>):</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">33</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">trajectory</span>.<span class="cm-property">append</span>((<span class="cm-variable">state</span>, <span class="cm-variable">action</span>, <span class="cm-variable">reward</span>))</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">34</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">35</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">if</span> <span class="cm-variable">done</span>:</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">36</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span> = <span class="cm-variable">pd</span>.<span class="cm-property">DataFrame</span>(<span class="cm-variable-2">self</span>.<span class="cm-property">trajectory</span>, <span class="cm-variable">columns</span>=[<span class="cm-string">"state"</span>, <span class="cm-string">"action"</span>, <span class="cm-string">"reward"</span>])</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">37</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"discount"</span>] = <span class="cm-variable-2">self</span>.<span class="cm-property">gamma</span> <span class="cm-operator">**</span> <span class="cm-variable">df</span>.<span class="cm-property">index</span>.<span class="cm-property">to_series</span>()</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">38</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>] = (<span class="cm-variable">df</span>[<span class="cm-string">"discount"</span>] <span class="cm-operator">*</span> <span class="cm-variable">df</span>[<span class="cm-string">"reward"</span>])[::<span class="cm-operator">-</span><span class="cm-number">1</span>].<span class="cm-property">cumsum</span>()</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">39</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">40</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">x</span> = <span class="cm-variable">np</span>.<span class="cm-property">stack</span>(<span class="cm-variable">df</span>[<span class="cm-string">"state"</span>])</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">41</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">if</span> <span class="cm-builtin">hasattr</span>(<span class="cm-variable-2">self</span>, <span class="cm-string">"baseline_net"</span>):</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">42</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"return"</span>] = <span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>] <span class="cm-operator">/</span> <span class="cm-variable">df</span>[<span class="cm-string">"discount"</span>]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">43</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">y</span> = <span class="cm-variable">df</span>[<span class="cm-string">"return"</span>].<span class="cm-property">values</span>[:, <span class="cm-variable">np</span>.<span class="cm-property">newaxis</span>]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">44</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">baseline_net</span>.<span class="cm-property">train_on_batch</span>(<span class="cm-variable">x</span>, <span class="cm-variable">y</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">45</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"baseline"</span>] = <span class="cm-variable-2">self</span>.<span class="cm-property">baseline_net</span>.<span class="cm-property">predict</span>(<span class="cm-variable">x</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">46</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>] -= (<span class="cm-variable">df</span>[<span class="cm-string">"discount"</span>] <span class="cm-operator">*</span> <span class="cm-variable">df</span>[<span class="cm-string">"baseline"</span>])</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">47</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-keyword">if</span> <span class="cm-variable-2">self</span>.<span class="cm-property">random_behavior</span>:</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">48</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>] *= <span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">49</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="">​</span></span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">50</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>] = (<span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>] <span class="cm-operator">-</span> <span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>].<span class="cm-property">mean</span>()) <span class="cm-operator">/</span> <span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>].<span class="cm-property">std</span>()</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">51</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">sample_weight</span> = <span class="cm-variable">df</span>[<span class="cm-string">"psi"</span>].<span class="cm-property">values</span>[:, <span class="cm-variable">np</span>.<span class="cm-property">newaxis</span>]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">52</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable">y</span> = <span class="cm-variable">np</span>.<span class="cm-property">eye</span>(<span class="cm-variable-2">self</span>.<span class="cm-property">action_n</span>)[<span class="cm-variable">df</span>[<span class="cm-string">"action"</span>]]</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt" style="left: 0px; width: 30px;">53</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">policy_net</span>.<span class="cm-property">train_on_batch</span>(<span class="cm-variable">x</span>, <span class="cm-variable">y</span>, <span class="cm-variable">sample_weight</span>=<span class="cm-variable">sample_weight</span>)</span></pre></div><div style="position: relative;"><div class="CodeMirror-gutter-wrapper" style="left: -39px;"><div class="CodeMirror-linenumber CodeMirror-gutter-elt CodeMirror-linenumber-show" style="left: 0px; width: 30px;">54</div></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="cm-variable-2">self</span>.<span class="cm-property">trajectory</span> = []</span></pre></div></div></div></div></div></div><div style="position: absolute; height: 0px; width: 1px; border-bottom: 0px solid transparent; top: 1728px;"></div><div class="CodeMirror-gutters" style="height: 1728px;"><div class="CodeMirror-gutter CodeMirror-linenumbers" style="width: 38px;"></div></div></div></div></pre><p><em><span>（由于学习率参数和探索随机性的影响，并不能保证每次运行的指定回合数的训练都能收敛，或者往好的趋势发展；可以尝试多次运行、调节学习率或增加训练回合数，也可以修改代码，在训练后保存模型，然后在下次训练前读取模型并调节学习率继续训练。）</span></em></p></div>
</body>
</html>